I have a problem at hand that seems to need "local" reducing:
I have a large data input, in which each line is a data mapping, something like "name : attribute". The attributes for the same name are usually pretty close in the file, so they are very likely to be processed by the same mapper. I need to persist the "name:attributes" somewhere else (think DB). It'll be optimal if I can combine the attributes of the same name together and only persist them once. Attributes for the same name from different mappers can be safely persisted separately.
I don't want to use reducers due to the network traffic. What I need is exactly what a combiner does, but as far as I can tell, combiners are not guaranteed to run or run only once (Correct me if I'm wrong here), so I guess I am not supposed to implement the persistence in the combiner.
Anybody has got a similar problem before? What's your solution?
Appreciate your help.