Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Combiners


Copy link to this message
-
Re: Combiners
Owen O'Malley 2011-10-31, 03:22
On Sat, Oct 29, 2011 at 3:52 AM, Mathias Herberts <
[EMAIL PROTECTED]> wrote:

> My question is, what happens if the combiner outputs different keys
> than what it is being fed? The output of the combiner will suffer two
> flaws:
>
> 1. It won't be sorted
> 2. It might end up in the wrong partition
>

Yes. We've talked about adding various checks, but I don't think anyone has
added them. We obviously have the input key and one option would be to
ignore the output key.
> Since a Combiner is simply a Reducer with no other constraints,
>

That isn't true. Combiners are required to be:
  1. Idempotent - The number of times the combiner is applied can't change
the output
  2. Transititive -  The order of the inputs can't change the output
  3. Side-effect free - Combiners can't have side effects (or they won't be
idempotent).
  4. Preserve the sort order - They can't change the keys to disrupt the
sort order
  5. Preserve the partitioning - They can't change the keys to change the
parititioning

All 5 of them are required for combiners.

-- Owen