Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Can I number output results with a Counter?


Copy link to this message
-
Re: Can I number output results with a Counter?
Mark Kerzner 2011-05-20, 18:38
Thank you, Kai and Joey, for the explanation. That's what I thought about
them, but did not want to miss the "magical" replacement for a central
services in the counters. No, there is no magic, just great reality.

Mark

On Fri, May 20, 2011 at 12:39 PM, Kai Voigt <[EMAIL PROTECTED]> wrote:

> Also, with speculative execution enabled, you might see a higher count as
> you expect while the same task is running multiple times in parallel. When a
> task gets killed because another instance was quicker, those counters will
> be removed from the global count though.
>
> Kai
>
> Am 20.05.2011 um 19:34 schrieb Joey Echeverria:
>
> > Counters are a way to get status from your running job. They don't
> > increment a global state. They locally save increments and
> > periodically report those increments to the central counter. That
> > means that the final count will be correct, but you can't use them to
> > coordinate counts while your job is running.
> >
> > -Joey
> >
> > On Fri, May 20, 2011 at 10:17 AM, Mark Kerzner <[EMAIL PROTECTED]>
> wrote:
> >> Joey,
> >>
> >> You understood me perfectly well. I see your first advice, but I am not
> >> allowed to have gaps. A central service is something I may consider if
> >> single reducer becomes a worse bottleneck than it.
> >>
> >> But what are counters for? They seem to be exactly that.
> >>
> >> Mark
> >>
> >> On Fri, May 20, 2011 at 12:01 PM, Joey Echeverria <[EMAIL PROTECTED]>
> wrote:
> >>
> >>> To make sure I understand you correctly, you need a globally unique
> >>> one up counter for each output record?
> >>>
> >>> If you had an upper bound on the number of records a single reducer
> >>> could output and you can afford to have gaps, you could just use the
> >>> task id and multiply that by the max number of records and then one up
> >>> from there.
> >>>
> >>> If that doesn't work for you, then you'll need to use some kind of
> >>> central service for allocating numbers which could become a
> >>> bottleneck.
> >>>
> >>> -Joey
> >>>
> >>> On Fri, May 20, 2011 at 9:55 AM, Mark Kerzner <[EMAIL PROTECTED]>
> >>> wrote:
> >>>> Hi, can I use a Counter to give each record in all reducers a
> consecutive
> >>>> number? Currently I am using a single Reducer, but it is an
> anti-pattern.
> >>>> But I need to assign consecutive numbers to all output records in all
> >>>> reducers, and it does not matter how, as long as each gets its own
> >>> number.
> >>>>
> >>>> If it IS possible, then how are multiple processes accessing those
> >>> counters
> >>>> without creating race conditions.
> >>>>
> >>>> Thank you,
> >>>>
> >>>> Mark
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Joseph Echeverria
> >>> Cloudera, Inc.
> >>> 443.305.9434
> >>>
> >>
> >
> >
> >
> > --
> > Joseph Echeverria
> > Cloudera, Inc.
> > 443.305.9434
> >
>
> --
> Kai Voigt
> [EMAIL PROTECTED]
>
>
>
>
>