apache pig - PIG: Do not understand why AVG deos not work when COUNT does -
i running following set of commands in pig. data set has 1 row each student in class , each student has number of grades. student name tab separated grades student. scores each student comma separated. need find average grade each student. after grouping, can count of grades each student cannot average score each student. pig complains cannot find iterator when averaging. confused since iterator both aggregate function count , avg same. not sure missing. appreciated?
scripts:
grunt> = load 'grades.txt' using pigstorage('\t') (f1:chararray,f2:chararray); grunt> dump a; (s14,59,94,81) (s15,60,77) (s16,77,77) (s17,76,76) (s18,19,61,72) (s20,34,35) grunt> b = foreach generate f1 stu, flatten(tokenize(f2)) (grade:int); grunt> describe b; b: {stu: chararray,grade: int} grunt> dump b; (s14,59) (s14,94) (s14,81) (s15,60) (s15,77) (s16,77) (s16,77) (s17,76) (s17,76) (s18,19) (s18,61) (s18,72) (s20,34) (s20,35) grunt> grp = group b stu; grunt> cnt = foreach grp generate group, count(b.grade); grunt> dump cnt; (s14,3) (s15,2) (s16,2) (s17,2) (s18,3) (s20,2) grunt> avg = foreach grp generate group, avg(b.grade); grunt> dump avg; 2015-03-20 21:56:30,900 error org.apache.pig.tools.pigstats.pigstatsutil: 1 map reduce job(s) failed! 2015-03-20 21:56:30,907 error org.apache.pig.tools.grunt.grunt: error 1066: unable open iterator alias avg details @ logfile: /home/training/pig/pig_1426902869706.log grunt>
as mentioned in comments, workaround found:
changed
b = foreach generate f1 stu, flatten(tokenize(f2)) (grade:int)
to
b = foreach generate f1 stu, flatten(tokenize(f2)) grade
and copied bag into:
c = foreach b generate stu stu, grade (int)grade;
Comments
Post a Comment