java - Parallelize a collection with Spark -
i'm trying parallelize collection spark , example in documentation doesn't seem work:
list<integer> data = arrays.aslist(1, 2, 3, 4, 5); javardd<integer> distdata = sc.parallelize(data);
i'm creating list of labeledpoint
s records each of contain data points (double[]
) , label (defaulted: true/false).
public list<labeledpoint> createlabeledpoints(list<esrecord> records) { list<labeledpoint> points = new arraylist<>(); (esrecord rec : records) { points.add(new labeledpoint( rec.defaulted ? 1.0 : 0.0, vectors.dense(rec.todatapoints()))); } return points; } public void test(list<esrecord> records) { sparkconf conf = new sparkconf().setappname("svm classifier example"); sparkcontext sc = new sparkcontext(conf); list<labeledpoint> points = createlabeledpoints(records); javardd<labeledpoint> data = sc.parallelize(points); ... }
the function signature of parallelize no longer taking 1 parameter, here how looks in spark-mllib_2.11 v1.3.0: sc.parallelize(seq, numslices, evidence$1)
so ideas on how working?
in java, should use javasparkcontext
.
https://spark.apache.org/docs/0.6.2/api/core/spark/api/java/javasparkcontext.html
Comments
Post a Comment