Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Can I number output results with a Counter?


Copy link to this message
-
Re: Can I number output results with a Counter?
Mark Kerzner 2011-05-20, 17:17
Joey,

You understood me perfectly well. I see your first advice, but I am not
allowed to have gaps. A central service is something I may consider if
single reducer becomes a worse bottleneck than it.

But what are counters for? They seem to be exactly that.

Mark

On Fri, May 20, 2011 at 12:01 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote:

> To make sure I understand you correctly, you need a globally unique
> one up counter for each output record?
>
> If you had an upper bound on the number of records a single reducer
> could output and you can afford to have gaps, you could just use the
> task id and multiply that by the max number of records and then one up
> from there.
>
> If that doesn't work for you, then you'll need to use some kind of
> central service for allocating numbers which could become a
> bottleneck.
>
> -Joey
>
> On Fri, May 20, 2011 at 9:55 AM, Mark Kerzner <[EMAIL PROTECTED]>
> wrote:
> > Hi, can I use a Counter to give each record in all reducers a consecutive
> > number? Currently I am using a single Reducer, but it is an anti-pattern.
> > But I need to assign consecutive numbers to all output records in all
> > reducers, and it does not matter how, as long as each gets its own
> number.
> >
> > If it IS possible, then how are multiple processes accessing those
> counters
> > without creating race conditions.
> >
> > Thank you,
> >
> > Mark
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>