Counters are a way to get status from your running job. They don't
increment a global state. They locally save increments and
periodically report those increments to the central counter. That
means that the final count will be correct, but you can't use them to
coordinate counts while your job is running.
On Fri, May 20, 2011 at 10:17 AM, Mark Kerzner <[EMAIL PROTECTED]> wrote:
> You understood me perfectly well. I see your first advice, but I am not
> allowed to have gaps. A central service is something I may consider if
> single reducer becomes a worse bottleneck than it.
> But what are counters for? They seem to be exactly that.
> On Fri, May 20, 2011 at 12:01 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
>> To make sure I understand you correctly, you need a globally unique
>> one up counter for each output record?
>> If you had an upper bound on the number of records a single reducer
>> could output and you can afford to have gaps, you could just use the
>> task id and multiply that by the max number of records and then one up
>> from there.
>> If that doesn't work for you, then you'll need to use some kind of
>> central service for allocating numbers which could become a
>> On Fri, May 20, 2011 at 9:55 AM, Mark Kerzner <[EMAIL PROTECTED]>
>> > Hi, can I use a Counter to give each record in all reducers a consecutive
>> > number? Currently I am using a single Reducer, but it is an anti-pattern.
>> > But I need to assign consecutive numbers to all output records in all
>> > reducers, and it does not matter how, as long as each gets its own
>> > If it IS possible, then how are multiple processes accessing those
>> > without creating race conditions.
>> > Thank you,
>> > Mark
>> Joseph Echeverria
>> Cloudera, Inc.