Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - how to access a mapper counter in reducer


Copy link to this message
-
Re: how to access a mapper counter in reducer
Praveen Sripati 2011-12-10, 03:00
Robert,

I will take a shot at it. I think it would be about writing a custom
comparator and a partitioner, reading some config parameters and sending
the counters as key/value pairs to the reducers. It shouldn't be that
difficult.

If I am stuck, I will post in the forum. I will also know how to create a
patch.

Regards,
Praveen

On Thu, Dec 8, 2011 at 9:45 PM, Robert Evans <[EMAIL PROTECTED]> wrote:

>  Sorry I have not responded sooner I have had a number of fires at work
> to put out, and I haven’t been keeping up with the user mailing lists.  The
> code I did before was very specific to the task I was working on, and it
> was an ugly hack because I did not bother with the comparator, I already
> knew there was only a small predefined set of keys so I just output one set
> of metadata data for each key.
>
> I would be happy to put something like this into the map/reduce framework.
>  I have filed https://issues.apache.org/jira/browse/MAPREDUCE-3520 for
> this. I just don’t know when I will have the time to do that, especially
> with my work on the 0.23 release.  I’ll also talk to my management to see
> if they want to allow me to work on this during work, or if it will have to
> be in my spare time.  Please feel free to comment on the JIRA or vote for
> it if you feel that it is something that you want done.  Or if you feel
> comfortable helping out perhaps you could take a first crack at it.
>
> Thanks,
>
> Bobby Evans
>
>
> On 12/6/11 9:14 AM, "Mapred Learn" <[EMAIL PROTECTED]> wrote:
>
> Hi Praveen,
> Could you share here so that we can use ?
>
> Thanks,
>
> Sent from my iPhone
>
> On Dec 6, 2011, at 6:29 AM, Praveen Sripati <[EMAIL PROTECTED]>
> wrote:
>
> Robert,
>
> > I have made the above thing work.
>
> Any plans to make it into the Hadoop framework. There had been similar
> queries about it in other forums also. Need any help testing/documenting or
> anything, please let me know.
>
> Regards,
> Praveen
>
> On Sat, Dec 3, 2011 at 2:34 AM, Robert Evans <[EMAIL PROTECTED] <
> mailto:[EMAIL PROTECTED] <[EMAIL PROTECTED]>> > wrote:
>
> Anurag,
>
> The current set of counter APIs from within Map or Reduce process are
> write only.  They are not intended to be used for reading data from other
> tasks.  They are there to be used for collecting statistics about the job
> as a whole.  If you use too many of them the performance of the system as a
> whole can get very bad, because they are stored on the JobTracker in
> memory.  Also there is the potential that a map task that has finished
> “successfully” can later fail if the node it is running on dies before all
> of the map output can be fetched by all of the reducers.  This could result
> in a reducer reading in counter data that is only partial or out of date.
>  You may be able to access it through the job API  but I would not
> recommend it and I think there may be some issues with security if you have
> security enabled, but I don’t know for sure.
>
> If you have an optimization that really needs summary data from each
> mapper in all reducers then you should do it a map/reduce way.   Output a
> special key/value pair when a mapper finishes for each reducer with the
> statistics in it.  You can know how many reducers there are because that is
> set in the configuration.  You then need a special partitioner to recognize
> those summary key/value pairs and make sure that they each go to the proper
> reducer.  You also need a special compairitor to make sure that these
> special keys are the very first ones read by the reducer so it can have the
> data before processing anything else.
>
> I would also recommend that you don’t try to store this data in HDFS.  You
> can very easily do a DDOS on the namenode on a large cluster, and then your
> ops will yell at you as they did with me before I stopped doing it.  I have
> made the above thing work.  It is just a lot of work to do it right.
>
> --Bobby Evans
>
>
>
> On 12/1/11 1:18 PM, "Markus Jelsma" <[EMAIL PROTECTED] <