Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Hadoop counter


Copy link to this message
-
Re: Hadoop counter
Hi Mike,

Sorry I am a bit lost... As you are thinking faster than me. :-P

>From your this statement "It would make sense that the JT maintains a
unique counter for each task until the tasks complete." -- it seems each
task cannot see counters from each other, since JT maintains a unique
counter for each tasks;

>From your this comment "I meant that if a Task created and updated a
counter, a different Task has access to that counter. " -- it seems
different tasks could share/access the same counter.

Appreciate if you could help to clarify a bit.

regards,
Lin

On Sat, Oct 20, 2012 at 12:42 AM, Michael Segel
<[EMAIL PROTECTED]>wrote:

>
> On Oct 19, 2012, at 11:27 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
>
> Hi Mike,
>
> Thanks for the detailed reply. Two quick questions/comments,
>
> 1. For "task", you mean a specific mapper instance, or a specific reducer
> instance?
>
>
> Either.
>
> 2. "However, I do not believe that a separate Task could connect with the
> JT and see if the counter exists or if it could get a value or even an
> accurate value since the updates are asynchronous." -- do you mean if a
> mapper is updating custom counter ABC, and another mapper is updating the
> same customer counter ABC, their counter values are updated independently
> by different mappers, and will not published (aggregated) externally until
> job completed successfully?
>
> I meant that if a Task created and updated a counter, a different Task has
> access to that counter.
>
> To give you an example, if I want to count the number of quality errors
> and then fail after X number of errors, I can't use Global counters to do
> this.
>
> regards,
> Lin
>
> On Fri, Oct 19, 2012 at 10:35 PM, Michael Segel <[EMAIL PROTECTED]
> > wrote:
>
>> As I understand it... each Task has its own counters and are
>> independently updated. As they report back to the JT, they update the
>> counter(s)' status.
>> The JT then will aggregate them.
>>
>> In terms of performance, Counters take up some memory in the JT so while
>> its OK to use them, if you abuse them, you can run in to issues.
>> As to limits... I guess that will depend on the amount of memory on the
>> JT machine, the size of the cluster (Number of TT) and the number of
>> counters.
>>
>> In terms of global accessibility... Maybe.
>>
>> The reason I say maybe is that I'm not sure by what you mean by globally
>> accessible.
>> If a task creates and implements a dynamic counter... I know that it will
>> eventually be reflected in the JT. However, I do not believe that a
>> separate Task could connect with the JT and see if the counter exists or if
>> it could get a value or even an accurate value since the updates are
>> asynchronous.  Not to mention that I don't believe that the counters are
>> aggregated until the job ends. It would make sense that the JT maintains a
>> unique counter for each task until the tasks complete. (If a task fails, it
>> would have to delete the counters so that when the task is restarted the
>> correct count is maintained. )  Note, I haven't looked at the source code
>> so I am probably wrong.
>>
>> HTH
>> Mike
>> On Oct 19, 2012, at 5:50 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
>>
>> Hi guys,
>>
>> I have some quick questions regarding to Hadoop counter,
>>
>>
>>    - Hadoop counter (customer defined) is global accessible (for both
>>    read and write) for all Mappers and Reducers in a job?
>>    - What is the performance and best practices of using Hadoop
>>    counters? I am not sure if using Hadoop counters too heavy, there will be
>>    performance downgrade to the whole job?
>>
>> regards,
>> Lin
>>
>>
>>
>
>