-Re: org.apache.accumulo.core.iterators.Combiner: key scope?
Aaron Cordova 2012-03-19, 20:09
I suppose this would be a bad time to bring up the idea of returning more than one Pair ..
The original semantics of reduce() from lisp is to compact everything down into one object .. but the original MapReduce semantics allow reduce and map functions to emit() as many new KV pairs as one desires. To bring Accumulo's reduce() function closer to the usage of MapReduce's reduce() might not introduce a huge amount of cognitive load on users, especially if they are coming from the MapReduce world.
However, I am strongly in favor of avoiding over-generalized and complicated APIs, and am certainly willing to deal with the constraint of only returning one Pair if everyone feels this will keep adoption and usage easy and simple.
On Mar 19, 2012, at 4:02 PM, Keith Turner wrote:
> On Mon, Mar 19, 2012 at 3:50 PM, Billie J Rinaldi
> <[EMAIL PROTECTED]> wrote:
>> Another thing to consider is what to do with the differing column qualifiers. Throw them away, returning a blank column qualifier on the single Key returned? What if we want to combine column qualifiers and ignore Values instead? Should we try to pass the qualifiers into a reduce method with the Values? That would be a more general approach, but I'm not sure how to create an API that wouldn't be messy.
> The following API might address the issues you raised
> public abstract Pair<Key, Value> reduce(Iterator<Pair<Key,Value>> iter)