Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Difference between combiner and aggregator


Copy link to this message
-
Difference between combiner and aggregator
Hi,
 I am trying to understand the difference between combiner and aggregator.

Based on my readings:
Wordcount example (mapper)

aggregator
class Mapper
  method MAP
  H <-- Associative array
  for all term t in document:
      H{t} = H{t} + 1
  for all term t ele H do
      EMIT(term t, count H{t})
combiner:
class Mapper
 method INITIALIZE
  H <-- Associative array
  method MAP
  for all term t in document:
      H{t} = H{t} + 1
 method CLOSE
  for all term t ele H do
      EMIT(term t, count H{t})

So, second method is how combiner is implemented.
But 1 seems much simpler?
What are the gains I get using combiner instead of local aggregations?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB