Hi HBase users,
I am currently trying to set up a denormalization map-reduce job for my HBase Table.
Since our table contains large volume of data, it's inefficient to have one scan object to scan everything. We are only need to process those records that have changes. I am planning to have multiple scan objects, each of which scan object specifies range given that we are in track of what rows has been changed.
Therefore I am trying to set up the map-reduce job with multiple scan objects, is this possible?
I am seeing some post online suggesting extending the InputFormat object and change the getSplits, is this the most efficient way?
Using filter seems to be not very efficient in my case because it's basically still scan the whole table,right? Just filter out some certain records.
Can you point me to the right direction?
Doug Meil 2013-01-18, 23:48
Ted Yu 2013-01-19, 17:20