On Tue, Oct 16, 2012 at 8:18 AM, Amit Sela <[EMAIL PROTECTED]> wrote:
> Has anyone tried extending PutSortReducer in order to add some traditional
> reduce logic (i.e, aggregating counters) ?
> I want to process data with hadoop mapreduce job (aggregate counters per
> keys - traditional hadoop mr) but I want to bulk load the reduce output to
> As I understand things, the "native" way to do it is to run two jobs, the
> first to aggregate counters by keys and the second to create Puts(Map
> phase) and bulk load into HBAse
> I was thinking of combining the two into one mapreduce where the Map of the
> first job is the Map of the combined job and the Reducer of the new job
> will extend PutSortReducer so that the reduce logic of the first job is
> implemented and then PutSortReducer reduce goes into action to write out as
> Any thoughts ? Anyone tried something similar and has something to add /
> correct ?
The general rule is to try and avoid the sort phase of mapreduce
writing hbase since hbase is going to 'sort' whatever you give it
(also inserting into hbase in order doesn't tend to spread the loading
too well -- especially if but a few reducers). But if you have a fat
enough map task, you might find it worth the extra resource burn.