Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> When to use a combiner?


Copy link to this message
-
When to use a combiner?
In working a sample issue I used a combiner - I noticed that the Combiner
output records were 90% of the Combiner Input records and
when looking at the data found relatively few duplicated keys. This raises
the question of what fraction of duplicate keys makes it reasonable to
use a combiner - If every key is unique I presume that using a combiner
will waste time and resources - especially if the data is large but
what fraction of duplicated keys is needed to justify a combiner??

--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB