Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Can I number output results with a Counter?


Copy link to this message
-
Re: Can I number output results with a Counter?
Joey,

You understood me perfectly well. I see your first advice, but I am not
allowed to have gaps. A central service is something I may consider if
single reducer becomes a worse bottleneck than it.

But what are counters for? They seem to be exactly that.

Mark

On Fri, May 20, 2011 at 12:01 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote:

> To make sure I understand you correctly, you need a globally unique
> one up counter for each output record?
>
> If you had an upper bound on the number of records a single reducer
> could output and you can afford to have gaps, you could just use the
> task id and multiply that by the max number of records and then one up
> from there.
>
> If that doesn't work for you, then you'll need to use some kind of
> central service for allocating numbers which could become a
> bottleneck.
>
> -Joey
>
> On Fri, May 20, 2011 at 9:55 AM, Mark Kerzner <[EMAIL PROTECTED]>
> wrote:
> > Hi, can I use a Counter to give each record in all reducers a consecutive
> > number? Currently I am using a single Reducer, but it is an anti-pattern.
> > But I need to assign consecutive numbers to all output records in all
> > reducers, and it does not matter how, as long as each gets its own
> number.
> >
> > If it IS possible, then how are multiple processes accessing those
> counters
> > without creating race conditions.
> >
> > Thank you,
> >
> > Mark
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB