Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Question about MapReduce


Copy link to this message
-
Re: Question about MapReduce
Jean-Marc Spaggiari 2012-10-29, 15:11
I'm replying to myself ;)

I found "cleanup" and "setup" methods from the TableMapper table. So I
think those are the methods I was looking for. I will init the
HTablePool there. Please let me know if I'm wrong.

Now, I still have few other questions.

1) context.getCurrentValue() can throw a InterrruptedException, but
when can this occur? Is there a timeout on the Mapper side? Of it's if
the region is going down while the job is running?
2) How can I pass parameters to the Map method? Can I use
job.getConfiguration().put to add some properties there, can get them
back in context.getConfiguration.get?
3) What's the best way to log results/exceptions/traces from the map method?

I will search on my side, but some help will be welcome because it
seems there is not much documentation when we start to dig a bit :(

JM

2012/10/27, Jean-Marc Spaggiari <[EMAIL PROTECTED]>:
> Hi,
>
> I'm thinking about my firs MapReduce class and I have some questions.
>
> The goal of it will be to move some rows from one table to another one
> based on the timestamp only.
>
> Since this is pretty new for me, I'm starting from the RowCounter
> class to have a baseline.
>
> There are few things I will have to update. First, the
> createSumittableJob method to get timestamp range instead of key
> range, and "play2 with the parameters. This part is fine.
>
> Next, I need to update the map method, and this is where I have some
> questions.
>
> I'm able to find the timestamp of all the cf:c from the
> context.getCurrentValue() method, that's fine. Now, my concern is on
> the way to get access to the table to store this field, and the table
> to delete it. Should I instantiate an HTable for the source table, and
> execute and delete on it, then do an insert on another HTable
> instance?  Should I use an HTablePool? Also, since I’m already on the
> row, can’t I just mark it as deleted instead of calling a new HTable?
>
> Also, instead of calling the delete and put one by one, I would like
> to put them on a list and execute it only when it’s over 10 members.
> How can I make sure that at the end of the job, this is flushed? Else,
> I will lose some operations. Is there a kind of “dispose” method
> called on the region when the job is done?
>
> Thanks,
>
> JM
>