Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - simple script generating 'too many counters' error


Copy link to this message
-
Re: simple script generating 'too many counters' error
Lauren Blau 2013-04-05, 15:40
now that I've turned off noSplitCombination we have 640 mappers.
the relation being ranked is likely in the billions or 1+ trillion records.

On Fri, Apr 5, 2013 at 10:47 AM, Bill Graham <[EMAIL PROTECTED]> wrote:

> How many mappers and reducers do you have? Skimming the Rank code it looks
> like it creates at least N counters per task which would be a scalability
> bug.
>
> On Friday, April 5, 2013, Lauren Blau wrote:
>
> > this is defintely caused by the RANK operator. Is there some way to
> reduce
> > the number of counters generated by this operator when working with large
> > data?
> > thanks
> >
> > On Thu, Apr 4, 2013 at 7:01 PM, Lauren Blau <
> > [EMAIL PROTECTED] <javascript:;>> wrote:
> >
> > > I can think of only 2 things that have changed since this script last
> ran
> > > successfully. Switched to using the range specification of the schema
> for
> > > a2, and the input data has grown considerably.
> > >
> > > Lauren
> > >
> > >
> > > On Thu, Apr 4, 2013 at 7:00 PM, Lauren Blau <
> > > [EMAIL PROTECTED] <javascript:;>> wrote:
> > >
> > >> no
> > >>
> > >>
> > >> On Thu, Apr 4, 2013 at 4:54 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]
> <javascript:;>
> > >wrote:
> > >>
> > >>> Do you have any special properties set?
> > >>> Like the pig.udf.profile one maybe..
> > >>> D
> > >>>
> > >>>
> > >>> On Thu, Apr 4, 2013 at 6:25 AM, Lauren Blau <
> > >>> [EMAIL PROTECTED] <javascript:;>> wrote:
> > >>>
> > >>> > I'm running a simple script to add a sequence_number to a relation,
> > >>> sort
> > >>> > the result and store to a file:
> > >>> >
> > >>> > a0 = load '<filename>' using PigStorage('\t','-schema');
> > >>> > a1 = rank a0;
> > >>> > a2 = foreach a1 generate col1 .. col16 , rank_a0 as
> sequence_number;
> > >>> > a3 = order a2 by  sequence_number;
> > >>> > store a3 into 'outputfile' using PigStorage('\t','-schema');
> > >>> >
> > >>> > I get the following error:
> > >>> > org.apache.hadoop.mapreduce.counters.LimitExceededException: Too
> many
> > >>> > counters: 241 max=240
> > >>> >     at
> > >>> >
> > >>>
> > org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:61)
> > >>> >     at
> > >>> >
> > >>>
> > org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:68)
> > >>> >     at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:174)
> > >>> >     at
> > >>> >
> org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:278)
> > >>> >     at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:303)
> > >>> >     at
> > >>> >
> > org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
> > >>> >     at
> > >>> >
> > org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
> > >>> >     at
> > >>> >
> > >>>
> > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:951)
> > >>> >     at org.apache.hadoop.ipc.Client$Connection.run(Client.java:835)
> > >>> >
> > >>> >
> > >>> > we aren't able to up our counters any higher (policy) and I don't
> > >>> > understand why I should need so many counters for such a simple
> > script
> > >>> > anyway?
> > >>> > running Apache Pig version 0.11.1-SNAPSHOT (r: unknown)
> > >>> > compiled Mar 22 2013, 10:19:19
> > >>> >
> > >>> > Can someone help?
> > >>> >
> > >>> > Thanks,
> > >>> > Lauren
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>
>
> --
> Sent from Gmail Mobile
>