-Re: org.apache.accumulo.core.iterators.Combiner: key scope?
Aaron Cordova 2012-03-20, 12:49
On Mar 19, 2012, at 4:28 PM, Keith Turner wrote:
> On Mon, Mar 19, 2012 at 4:09 PM, Aaron Cordova <[EMAIL PROTECTED]> wrote:
>> I suppose this would be a bad time to bring up the idea of returning more than one Pair ..
>> The original semantics of reduce() from lisp is to compact everything down into one object .. but the original MapReduce semantics allow reduce and map functions to emit() as many new KV pairs as one desires. To bring Accumulo's reduce() function closer to the usage of MapReduce's reduce() might not introduce a huge amount of cognitive load on users, especially if they are coming from the MapReduce world.
>> However, I am strongly in favor of avoiding over-generalized and complicated APIs, and am certainly willing to deal with the constraint of only returning one Pair if everyone feels this will keep adoption and usage easy and simple.
> I think thats reducing to multiple is ok. The important part is
> getting the API right. What API were you thinking of? Even if we do
> not do it, its nice to explore it and know what our options are.
> One thing that I realized about returning a key or keys, is that it
> gives the user a chance to return something out of sorted order. This
> is a difference w/ the map reduce model, the output of a map reduce
> reducer need not be sorted.
Right, but that's true of the output of Map() and the framework just sorts the KV pairs for you.
However, I don't see a good way for Accumulo to maintain global sort order of a list of KV pairs from reduce() now so maybe that's reason enough to not do it.
> If the user generates keys out of order,
> this will not be caught until runtime. The API on the current
> combiner does not give control over the key. So that prevents this