On Mon, May 23, 2011 at 11:32 AM, Mike Spreitzer <[EMAIL PROTECTED]> wrote:
> What happens if one invocation of a combiner outputs more than one value?
> What happens if an output key is different from the input key?
The combiner is responsible for maintaining the sort order (and
partitioning) effected prior to that step. So given a record (k,v),
one can emit any number of records with keys equal to (but not
necessarily the same as) k, per the user-defined comparator. Note that
the grouping comparator affects this constraint.
The partition of a record is not reevaluated after a map emits it, so
a combiner that emits records that belong to a different partition
will not group all the keys as expected (i.e. the same key could
appear in two different reducers). Similarly, emitting records out of
sorted order will have undefined effects. It's possible to work around
these constraints, or even write applications that depend on them, but
then your application is writing around the implementation details of
Apache Hadoop and may break in subsequent releases. -C