Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Question about MapReduce


Copy link to this message
-
Re: Question about MapReduce
Sorry, one last question.

On the map method, I have access to the row using the values
parameter. Now, based on the value content, I might want to delete it.
Do I have access to the table directly from one of the parameters? Or
should I call the delete using an HTableInterface from my pool?

Thanks,

JM

2012/11/2, Jean-Marc Spaggiari <[EMAIL PROTECTED]>:
> Yep, you perfectly got my question.
>
> I just tried and it's working perfectly!
>
> Thanks a lot! I now have a lot to play with.
>
> JM
>
> 2012/11/2, Shrijeet Paliwal <[EMAIL PROTECTED]>:
>> JM,
>>
>> I personally would chose to put it neither hadoop libs nor hbase libs.
>> Have
>> them go to your application's own install directory.
>>
>> Then you could sent the variable HADOOP_CLASSPATH to have your jar (also
>> include hbase jars, hbase dependencies and dependencies your program
>> needs)
>> And to execute fire 'hadoop jar' command.
>>
>> An example[1]:
>>
>> Set classpath:
>> export HADOOP_CLASSPATH=`hbase classpath`:mycool.jar:mycooldependency.jar
>>
>> Fire following to launch your job:
>> hadoop jar mycool.jar hbase.experiments.MyCoolProgram
>> -Dmapred.running.map.limit=50
>> -Dmapred.map.tasks.speculative.execution=false aCommandLineArg
>>
>>
>> Did I get your question right?
>>
>> [1] In the example I gave `hbase classpath` gets you set with all hbase
>> jars.
>>
>>
>>
>> On Fri, Nov 2, 2012 at 11:56 AM, Jean-Marc Spaggiari <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hi Shrijeet,
>>>
>>> Helped a lot! Thanks!
>>>
>>> Now, the only think I need is to know where's the best place to put my
>>> JAR on the server. Should I put it on the hadoop lib directory? Or
>>> somewhere on the HBase structure?
>>>
>>> Thanks,
>>>
>>> JM
>>>
>>> 2012/10/29, Shrijeet Paliwal <[EMAIL PROTECTED]>:
>>> > In line.
>>> >
>>> > On Mon, Oct 29, 2012 at 8:11 AM, Jean-Marc Spaggiari <
>>> > [EMAIL PROTECTED]> wrote:
>>> >
>>> >> I'm replying to myself ;)
>>> >>
>>> >> I found "cleanup" and "setup" methods from the TableMapper table. So
>>> >> I
>>> >> think those are the methods I was looking for. I will init the
>>> >> HTablePool there. Please let me know if I'm wrong.
>>> >>
>>> >> Now, I still have few other questions.
>>> >>
>>> >> 1) context.getCurrentValue() can throw a InterrruptedException, but
>>> >> when can this occur? Is there a timeout on the Mapper side? Of it's
>>> >> if
>>> >> the region is going down while the job is running?
>>> >>
>>> >
>>> > You do not need to call  context.getCurrentValue(). The 'value'
>>> > argument
>>> to
>>> > map method[1] has the information you are looking for.
>>> >
>>> >
>>> >> 2) How can I pass parameters to the Map method? Can I use
>>> >> job.getConfiguration().put to add some properties there, can get them
>>> >> back in context.getConfiguration.get?
>>> >>
>>> >
>>> > Yes, thats how it is done.
>>> >
>>> >
>>> >> 3) What's the best way to log results/exceptions/traces from the map
>>> >> method?
>>> >>
>>> >
>>> > In most cases, you'll have mapper and reducer classes as nested static
>>> > classes within some enclosing class. You can get handle to the Logger
>>> from
>>> > the enclosing class and do your usual LOG.info, LOG.warn yada yada.
>>> >
>>> > Hope it helps.
>>> >
>>> > [1] map(KEYIN key, *VALUEIN value*, Context context)
>>> >
>>> >>
>>> >> I will search on my side, but some help will be welcome because it
>>> >> seems there is not much documentation when we start to dig a bit :(
>>> >>
>>> >> JM
>>> >>
>>> >> 2012/10/27, Jean-Marc Spaggiari <[EMAIL PROTECTED]>:
>>> >> > Hi,
>>> >> >
>>> >> > I'm thinking about my firs MapReduce class and I have some
>>> >> > questions.
>>> >> >
>>> >> > The goal of it will be to move some rows from one table to another
>>> >> > one
>>> >> > based on the timestamp only.
>>> >> >
>>> >> > Since this is pretty new for me, I'm starting from the RowCounter
>>> >> > class to have a baseline.
>>> >> >
>>> >> > There are few things I will have to update. First, the
>>> >>