scala - Apache Spark RDD - not updating -

- September 15, 2013

i create pairrdd contains vector.

var newrdd = oldrdd.mapvalues(listofitemsandratings => vector(array.fill(2){math.random}))

later on update rdd:

newrdd.lookup(ratingobject.user)(0) += 0.2 * (errorrate(rating) * myvector)

however, although outputs updated vector (as shown in console), when next call newrdd can see vector value has changed. through testing have concluded has changed given math.random - every time call newrdd vector changes. understand there lineage graph , maybe has it. need update vector held in rdd new values , need repeatedly.

thanks.

rdd immutable structures meant distribute operations on data on cluster. there're 2 elements playing role in behavior observing here:

rdd lineage may computed every time. in case, means action on newrdd might trigger lineage computation, therefore applying vector(array.fill(2){math.random}) transformation , resulting in new values each time. lineage can broken using cache, in case value of transformation kept in memory and/or disk after first time it's applied. results in:

val randomvectorrdd = oldrdd.mapvalues(listofitemsandratings => vector(array.fill(2){math.random})) randomvectorrdd.cache()

the second aspect needs further consideration on-site mutation:

newrdd.lookup(ratingobject.user)(0) += 0.2 * (errorrate(rating) * myvector)

although might work on single machine because vector references local, not scale cluster lookup references serialized , mutations not preserved. therefore bears question of why use spark this.

to implemented on spark, algorithm need re-design in order expressed in terms of transformations instead of punctual lookup/mutations.

Search This Blog

Print F

scala - Apache Spark RDD - not updating -

Comments

Post a Comment

Popular posts from this blog

node.js - How to mock a third-party api calls in the backend -

node.js - Why do I get "SOCKS connection failed. Connection not allowed by ruleset" for some .onion sites? -

matlab - 0-by-1 sym - What do I need to change in order to get proper symbolic results? -