Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> When to use a combiner?

Copy link to this message
When to use a combiner?
In working a sample issue I used a combiner - I noticed that the Combiner
output records were 90% of the Combiner Input records and
when looking at the data found relatively few duplicated keys. This raises
the question of what fraction of duplicate keys makes it reasonable to
use a combiner - If every key is unique I presume that using a combiner
will waste time and resources - especially if the data is large but
what fraction of duplicated keys is needed to justify a combiner??

Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com