Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Combiners


+
Mathias Herberts 2011-10-29, 10:52
On Sat, Oct 29, 2011 at 3:52 AM, Mathias Herberts <
[EMAIL PROTECTED]> wrote:

> My question is, what happens if the combiner outputs different keys
> than what it is being fed? The output of the combiner will suffer two
> flaws:
>
> 1. It won't be sorted
> 2. It might end up in the wrong partition
>

Yes. We've talked about adding various checks, but I don't think anyone has
added them. We obviously have the input key and one option would be to
ignore the output key.
> Since a Combiner is simply a Reducer with no other constraints,
>

That isn't true. Combiners are required to be:
  1. Idempotent - The number of times the combiner is applied can't change
the output
  2. Transititive -  The order of the inputs can't change the output
  3. Side-effect free - Combiners can't have side effects (or they won't be
idempotent).
  4. Preserve the sort order - They can't change the keys to disrupt the
sort order
  5. Preserve the partitioning - They can't change the keys to change the
parititioning

All 5 of them are required for combiners.

-- Owen
+
Mathias Herberts 2011-10-31, 12:41
+
Owen OMalley 2011-10-31, 15:15
+
Shevek 2011-10-31, 17:41