Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Question about MapReduce


Copy link to this message
-
Question about MapReduce
Jean-Marc Spaggiari 2012-10-27, 20:30
Hi,

I'm thinking about my firs MapReduce class and I have some questions.

The goal of it will be to move some rows from one table to another one
based on the timestamp only.

Since this is pretty new for me, I'm starting from the RowCounter
class to have a baseline.

There are few things I will have to update. First, the
createSumittableJob method to get timestamp range instead of key
range, and "play2 with the parameters. This part is fine.

Next, I need to update the map method, and this is where I have some questions.

I'm able to find the timestamp of all the cf:c from the
context.getCurrentValue() method, that's fine. Now, my concern is on
the way to get access to the table to store this field, and the table
to delete it. Should I instantiate an HTable for the source table, and
execute and delete on it, then do an insert on another HTable
instance?  Should I use an HTablePool? Also, since I’m already on the
row, can’t I just mark it as deleted instead of calling a new HTable?

Also, instead of calling the delete and put one by one, I would like
to put them on a list and execute it only when it’s over 10 members.
How can I make sure that at the end of the job, this is flushed? Else,
I will lose some operations. Is there a kind of “dispose” method
called on the region when the job is done?

Thanks,

JM