Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> anyway to do "local" reduce like the combiner does?


Copy link to this message
-
anyway to do "local" reduce like the combiner does?


I have a problem at hand that seems to need "local" reducing:
I have a large data input, in which each line is a data mapping, something like "name : attribute". The attributes for the same name are usually pretty close in the file, so they are very likely to be processed by the same mapper. I need to persist the "name:attributes" somewhere else (think DB). It'll be optimal if I can combine the attributes of the same name together and only persist them once. Attributes for the same name from different mappers can be safely persisted separately. 

I don't want to use reducers due to the network traffic. What I need is exactly what a combiner does, but as far as I can tell, combiners are not guaranteed to run or run only once (Correct me if I'm wrong here), so I guess I am not supposed to implement the persistence in the combiner. 

Anybody has got a similar problem before? What's your solution?
Appreciate your help.
Thanks,
James
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB