Looking at the blog:
Counters represent global counters, defined either by the Map/Reduce
framework or applications. Applications can define arbitrary Counters
and update them in the map and/or reduce methods. These counters are
then globally aggregated by the framework.
Counters are appropriate for tracking few, important, global bits of
information. They are definitely not meant to aggregate very
fine-grained statistics of applications.
Counters are very expensive since the JobTracker has to maintain
every counter of every map/reduce task for the entire duration of the
Grid Pattern: Applications should not use more than 10, 15 or 25 custom counters."
I have to question the limitation. It seems arbitrary.
I agree that counters add additional overhead, but suppose I wanted to run the word count m/r as a map only job and use counters as a way to capture a count per word?
At what point does the cost of the counter(s) exceed the cost of the reduce job?
> Date: Tue, 29 Mar 2011 10:28:19 +0100
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Re: does counters go the performance down seriously?
> On 28/03/11 23:34, JunYoung Kim wrote:
> > hi,
> > this linke is about hadoop usage for the good practices.
> > http://developer.yahoo.com/blogs/hadoop/posts/2010/08/apache_hadoop_best_practices_a/ by Arun C Murthy
> > if I want to use about 50,000 counters for a job, does it cause serious performance down?
> Yes, you will use up lots of JT memory and so put limits on the overall
> size of your cluster.
> If you have a small cluster and can crank up the memory settings on the
> JT to 48 GB this isn't going to be an issue, but as Y! are topping out
> at these numbers anyway, lots of counters just overload them.