Harsh J 2012-10-19, 15:50
Thanks Harsh. Great learning from you as always. :)
Sent from handheld, please excuse typos.
From: Harsh J <[EMAIL PROTECTED]>
Date: Fri, 19 Oct 2012 21:20:07
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Re: Hadoop counter
Bejoy is almost right, except that counters are reported upon progress
of tasks itself (via TT heartbeats to JT actually), but the final
counter representation is computed only with successful task reports
the job received, not from any failed or killed ones.
On Fri, Oct 19, 2012 at 8:51 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> Hi Jay
> Counters are reported at the end of a task to JT. So if a task fails the
> counters from that task are not send to JT and hence won't be included in
> the final value of counters from that Job.
> Bejoy KS
> Sent from handheld, please excuse typos.
> From: Jay Vyas <[EMAIL PROTECTED]>
> Date: Fri, 19 Oct 2012 10:18:42 -0500
> To: <[EMAIL PROTECTED]>
> ReplyTo: [EMAIL PROTECTED]
> Subject: Re: Hadoop counter
> Ah this answers alot about why some of my dynamic counters never show up and
> i have to bite my nails waiting to see whats going on until the end of the
> job- thanks.
> Another question: what happens if a task fails ? What happen to the
> counters for it ? Do they dissappear into the ether? Or do they get merged
> in with the counters from other tasks?
> On Fri, Oct 19, 2012 at 9:50 AM, Bertrand Dechoux <[EMAIL PROTECTED]>
>> And by default the number of counters is limited to 120 with the
>> mapreduce.job.counters.limit property.
>> They are useful for displaying short statistics about a job but should not
>> be used for results (imho).
>> I know people may misuse them but I haven't tried so I wouldn't be able to
>> list the caveats.
>> On Fri, Oct 19, 2012 at 4:35 PM, Michael Segel <[EMAIL PROTECTED]>
>>> As I understand it... each Task has its own counters and are
>>> independently updated. As they report back to the JT, they update the
>>> counter(s)' status.
>>> The JT then will aggregate them.
>>> In terms of performance, Counters take up some memory in the JT so while
>>> its OK to use them, if you abuse them, you can run in to issues.
>>> As to limits... I guess that will depend on the amount of memory on the
>>> JT machine, the size of the cluster (Number of TT) and the number of
>>> In terms of global accessibility... Maybe.
>>> The reason I say maybe is that I'm not sure by what you mean by globally
>>> If a task creates and implements a dynamic counter... I know that it will
>>> eventually be reflected in the JT. However, I do not believe that a separate
>>> Task could connect with the JT and see if the counter exists or if it could
>>> get a value or even an accurate value since the updates are asynchronous.
>>> Not to mention that I don't believe that the counters are aggregated until
>>> the job ends. It would make sense that the JT maintains a unique counter for
>>> each task until the tasks complete. (If a task fails, it would have to
>>> delete the counters so that when the task is restarted the correct count is
>>> maintained. ) Note, I haven't looked at the source code so I am probably
>>> On Oct 19, 2012, at 5:50 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
>>> Hi guys,
>>> I have some quick questions regarding to Hadoop counter,
>>> Hadoop counter (customer defined) is global accessible (for both read and
>>> write) for all Mappers and Reducers in a job?
>>> What is the performance and best practices of using Hadoop counters? I am
>>> not sure if using Hadoop counters too heavy, there will be performance
>>> downgrade to the whole job?
>> Bertrand Dechoux
> Jay Vyas