Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> how to access a mapper counter in reducer


Copy link to this message
-
Re: how to access a mapper counter in reducer
Robert,

> I have made the above thing work.

Any plans to make it into the Hadoop framework. There had been similar
queries about it in other forums also. Need any help testing/documenting or
anything, please let me know.

Regards,
Praveen

On Sat, Dec 3, 2011 at 2:34 AM, Robert Evans <[EMAIL PROTECTED]> wrote:

>  Anurag,
>
> The current set of counter APIs from within Map or Reduce process are
> write only.  They are not intended to be used for reading data from other
> tasks.  They are there to be used for collecting statistics about the job
> as a whole.  If you use too many of them the performance of the system as a
> whole can get very bad, because they are stored on the JobTracker in
> memory.  Also there is the potential that a map task that has finished
> “successfully” can later fail if the node it is running on dies before all
> of the map output can be fetched by all of the reducers.  This could result
> in a reducer reading in counter data that is only partial or out of date.
>  You may be able to access it through the job API  but I would not
> recommend it and I think there may be some issues with security if you have
> security enabled, but I don’t know for sure.
>
> If you have an optimization that really needs summary data from each
> mapper in all reducers then you should do it a map/reduce way.   Output a
> special key/value pair when a mapper finishes for each reducer with the
> statistics in it.  You can know how many reducers there are because that is
> set in the configuration.  You then need a special partitioner to recognize
> those summary key/value pairs and make sure that they each go to the proper
> reducer.  You also need a special compairitor to make sure that these
> special keys are the very first ones read by the reducer so it can have the
> data before processing anything else.
>
> I would also recommend that you don’t try to store this data in HDFS.  You
> can very easily do a DDOS on the namenode on a large cluster, and then your
> ops will yell at you as they did with me before I stopped doing it.  I have
> made the above thing work.  It is just a lot of work to do it right.
>
> --Bobby Evans
>
>
>
> On 12/1/11 1:18 PM, "Markus Jelsma" <[EMAIL PROTECTED]> wrote:
>
> Can access it via the Job API?
>
>
> http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29
>
> > Hi,
>
> > I have a similar query.
>
> >
>
> > Infact, I sent it yesterday and waiting for anybody's response who might
>
> > have done it.
>
> >
>
> >
>
> > Thanks,
>
> > Anurag Tangri
>
> >
>
> > 2011/11/30 rabbit_cheng <[EMAIL PROTECTED]>
>
> >
>
> > >  I have created a counter in mapper to count something, I wanna get the
>
> > >
>
> > > counter's value in reducer phase, the code segment is as follow:
>
> > >
>
> > > public class MM extends Mapper<LongWritable, Text, Text, Text> {
>
> > >
>
> > >     static enum TEST{ pt }
>
> > >     @Override
>
> > >     public void map(LongWritable key, Text values, Context context)
>
> > >     throws
>
> > >
>
> > > IOException, InterruptedException {
>
> > >
>
> > >         context.getCounter(TEST.pt).increment(1);
>
> > >
>
> > >     }
>
> > >
>
> > > }
>
> > > public class KMeansReducer extends Reducer<Text, Text, Text, Text> {
>
> > >
>
> > >     @Override
>
> > >     protected void setup(Context context) throws IOException,
>
> > >
>
> > > InterruptedException {
>
> > >
>
> > >         long ptValue=context.getCounter(MM.TEST.pt <http://mm.test.pt/
> >
>
> > >
>
> > > ).getValue();
>
> > >
>
> > >     }
>
> > >
>
> > > }
>
> > > but what I get is always 0, i.e., the value of variable ptValue is
> always
>
> > > 0.
>
> > > Does anybody know how to access a mapper counter in reducer?
>
>