Hadoop, mail # user - When to use a combiner?

When to use a combiner?
Steve Lewis 2012-01-24, 17:33
In working a sample issue I used a combiner - I noticed that the Combiner
output records were 90% of the Combiner Input records and
when looking at the data found relatively few duplicated keys. This raises
the question of what fraction of duplicate keys makes it reasonable to
use a combiner - If every key is unique I presume that using a combiner
will waste time and resources - especially if the data is large but
what fraction of duplicated keys is needed to justify a combiner??

