Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Can I number output results with a Counter?


Copy link to this message
-
Re: Can I number output results with a Counter?
Thank you, Kai and Joey, for the explanation. That's what I thought about
them, but did not want to miss the "magical" replacement for a central
services in the counters. No, there is no magic, just great reality.

Mark

On Fri, May 20, 2011 at 12:39 PM, Kai Voigt <[EMAIL PROTECTED]> wrote:

> Also, with speculative execution enabled, you might see a higher count as
> you expect while the same task is running multiple times in parallel. When a
> task gets killed because another instance was quicker, those counters will
> be removed from the global count though.
>
> Kai
>
> Am 20.05.2011 um 19:34 schrieb Joey Echeverria:
>
> > Counters are a way to get status from your running job. They don't
> > increment a global state. They locally save increments and
> > periodically report those increments to the central counter. That
> > means that the final count will be correct, but you can't use them to
> > coordinate counts while your job is running.
> >
> > -Joey
> >
> > On Fri, May 20, 2011 at 10:17 AM, Mark Kerzner <[EMAIL PROTECTED]>
> wrote:
> >> Joey,
> >>
> >> You understood me perfectly well. I see your first advice, but I am not
> >> allowed to have gaps. A central service is something I may consider if
> >> single reducer becomes a worse bottleneck than it.
> >>
> >> But what are counters for? They seem to be exactly that.
> >>
> >> Mark
> >>
> >> On Fri, May 20, 2011 at 12:01 PM, Joey Echeverria <[EMAIL PROTECTED]>
> wrote:
> >>
> >>> To make sure I understand you correctly, you need a globally unique
> >>> one up counter for each output record?
> >>>
> >>> If you had an upper bound on the number of records a single reducer
> >>> could output and you can afford to have gaps, you could just use the
> >>> task id and multiply that by the max number of records and then one up
> >>> from there.
> >>>
> >>> If that doesn't work for you, then you'll need to use some kind of
> >>> central service for allocating numbers which could become a
> >>> bottleneck.
> >>>
> >>> -Joey
> >>>
> >>> On Fri, May 20, 2011 at 9:55 AM, Mark Kerzner <[EMAIL PROTECTED]>
> >>> wrote:
> >>>> Hi, can I use a Counter to give each record in all reducers a
> consecutive
> >>>> number? Currently I am using a single Reducer, but it is an
> anti-pattern.
> >>>> But I need to assign consecutive numbers to all output records in all
> >>>> reducers, and it does not matter how, as long as each gets its own
> >>> number.
> >>>>
> >>>> If it IS possible, then how are multiple processes accessing those
> >>> counters
> >>>> without creating race conditions.
> >>>>
> >>>> Thank you,
> >>>>
> >>>> Mark
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Joseph Echeverria
> >>> Cloudera, Inc.
> >>> 443.305.9434
> >>>
> >>
> >
> >
> >
> > --
> > Joseph Echeverria
> > Cloudera, Inc.
> > 443.305.9434
> >
>
> --
> Kai Voigt
> [EMAIL PROTECTED]
>
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB