java - Performing more than one reduction in a single pass -
what idiom performing more 1 reduction in single pass of stream? have 1 big reducer class, if violates srp if more 1 type of reduction computation required?
presumably want avoid making multiple passes, pipeline stages might expensive. or want avoid collecting intermediate values in order run them through multiple collectors, since cost of storing values might high.
as brian goetz noted, collectors.summarizingint
collect int
values , perform multiple reductions on them, returning aggregate structure called intsummarystatistics
. there similar collectors summarizing double
, long
values.
unfortunately these perform fixed set of reductions, if want reductions different do, have write own collector.
here's technique using multiple, unrelated collectors in single pass. can use peek()
take crack @ every value going through stream, passing through undisturbed. peek()
operation takes consumer
, need way adapt collector
consumer
. consumer
collector's accumulator function. need call collector's supplier function , store object creates passing accumulator function. , need way result out of collector. this, we'll wrap collector in little helper class:
public class peekingcollector<t,a,r> { final collector<t,a,r> collector; final acc; public peekingcollector(collector<t,a,r> collector) { this.collector = collector; this.acc = collector.supplier().get(); } public consumer<t> peek() { if (collector.characteristics().contains(collector.characteristics.concurrent)) return t -> collector.accumulator().accept(acc, t); else return t -> { synchronized (this) { collector.accumulator().accept(acc, t); } }; } public synchronized r get() { return collector.finisher().apply(acc); } }
to use this, first have create wrapped collector , hang onto it. run pipeline , call peek
, passing wrapped collector. call get
on wrapped collector result. here's simple example filters , sorts words, while grouping them first letter:
list<string> input = arrays.aslist( "aardvark", "crocodile", "antelope", "buffalo", "bustard", "cockatoo", "capybara", "bison", "alligator"); peekingcollector<string,?,map<string,list<string>>> grouper = new peekingcollector<>(groupingby(s -> s.substring(0, 1))); list<string> output = input.stream() .filter(s -> s.length() > 5) .peek(grouper.peek()) .sorted() .collect(tolist()); map<string,list<string>> groups = grouper.get(); system.out.println(output); system.out.println(groups);
output is:
[aardvark, alligator, antelope, buffalo, bustard, capybara, cockatoo, crocodile] {a=[aardvark, antelope, alligator], b=[buffalo, bustard], c=[crocodile, cockatoo, capybara]}
it's bit cumbersome, have write out generic types wrapped collector (which bit unusual; they're inferred). if expense of processing or storing stream values great enough, perhaps it's worth trouble.
finally note peek()
can called multiple threads if stream run in parallel. reason non-thread-safe collectors must protected synchronized
block. if collector thread-safe, needn't synchronize around calling it. determine this, check collector's concurrent
characteristic. if run parallel stream, it's preferable place concurrent collector (such groupingbyconcurrent
or toconcurrentmap
) within peek
operation, otherwise synchronization within wrapped collector may cause bottleneck , slow down entire stream.
Comments
Post a Comment