Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> anyway to do "local" reduce like the combiner does?

Copy link to this message
anyway to do "local" reduce like the combiner does?

I have a problem at hand that seems to need "local" reducing:
I have a large data input, in which each line is a data mapping, something like "name : attribute". The attributes for the same name are usually pretty close in the file, so they are very likely to be processed by the same mapper. I need to persist the "name:attributes" somewhere else (think DB). It'll be optimal if I can combine the attributes of the same name together and only persist them once. Attributes for the same name from different mappers can be safely persisted separately. 

I don't want to use reducers due to the network traffic. What I need is exactly what a combiner does, but as far as I can tell, combiners are not guaranteed to run or run only once (Correct me if I'm wrong here), so I guess I am not supposed to implement the persistence in the combiner. 

Anybody has got a similar problem before? What's your solution?
Appreciate your help.