Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Increment Counters in HBase during MapReduce

Copy link to this message
Re: Increment Counters in HBase during MapReduce
Michael Segel 2012-06-24, 23:19
There are a couple of issues and I'm sure others will point them out.

If you turn off speculative execution on the job, you don't get duplicate tasks running in parallel.
You could create a table to store your aggregations on a per job basis where your row-id could incorporate your job-id.
Then at the end of the job. If you didn't have any task failures or speculative execution jobs, you could count on your aggregations to be correct.
If you had a task fail or killed (a simple test if for some reason a job ran with speculative execution) you could discard that row's data.

On Jun 24, 2012, at 4:15 PM, David Koch wrote:

> Hello J-D
> I have a similar requirement as that presented by the original poster, i.e
> updating a totals count without having to push the entire data set through
> the Mapper again.
> Are you advising against calling incrementColumnValue on a mapper's HTable
> instance because the operation is not idempotent or are there other
> reasons? It is even suggested in the docs:
> http://hbase.apache.org/book/mapreduce.example.html (section 7.2.6).
> Do you know of any "count-exactly-once" implementations on top of Hadoop
> Map/Reduce?
> Thanks,
> /David
> On Tue, Jun 19, 2012 at 6:55 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
>> This question was answered here already:
>> http://mail-archives.apache.org/mod_mbox/hbase-user/201101.mbox/%[EMAIL PROTECTED]%3E
>> Counters are not idempotent, this can be hard to manage.
>> J-D
>> On Mon, Jun 18, 2012 at 5:49 PM, Sid Kumar <[EMAIL PROTECTED]> wrote:
>>> Hi everyone,
>>>   I have a use case in HBase that I was wondering if someone may have
>>> stumbled upon. I am maintaining an ad impressions table with columns that
>>> are counters for certain metrics. I started using the
>> incrementColumnValue
>>> method part of the HTable API to update these metrics and that works
>> great.
>>>   I was wondering if this function could be used from a MapReduce job.
>>> The TableOutputFormat supports only Delete and Put operations. Using the
>>> Incremental counters saves me from doing any aggregations in my Map
>> Reduce
>>> code. Ideally i would like to just call this function in my mapper and
>>> wouldn't even need a Reducer.
>>>   Has anyone run into this use case? I would also love to know if there
>>> are any better alternatives of solving this too. Any info would be great.
>>> Thanks
>>> Sid