Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> extending PutSortReducer


Copy link to this message
-
extending PutSortReducer
Hi all,

Has anyone tried extending PutSortReducer in order to add some traditional
reduce logic (i.e, aggregating counters) ?

I want to process data with hadoop mapreduce job (aggregate counters per
keys - traditional hadoop mr) but I want to bulk load the reduce output to
HBase.
As I understand things, the "native" way to do it is to run two jobs, the
first to aggregate counters by keys and the second to create Puts(Map
phase) and bulk load into HBAse
(HFileOutputFormat.configureIncrementalLoad()).

I was thinking of combining the two into one mapreduce where the Map of the
first job is the Map of the combined job and the Reducer  of the new job
will extend PutSortReducer so that the reduce logic of the first job is
implemented and then PutSortReducer reduce goes into action to write out as
KeyValue.

Any thoughts ? Anyone tried something similar and has something to add /
correct ?

Thanks,
Amit.