Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Hadoop counter

Copy link to this message
Re: Hadoop counter
Thanks for the detailed reply Mike, I learned a lot from the discussion.

- I just want to confirm with you that, supposing in the same job, when a
specific task completed (and counter is aggregated in JT after the task
completed from our discussion?), the other running task in the same job
cannot get the updated counter value from the previous completed task? I am
asking this because I am thinking whether I can use counter to share a
global value between tasks.
- If so, what is the traditional use case of counter, only use counter
values after the whole job completes?

BTW: appreciate if you could share me a few use cases from your experience
about how counters are used.


On Sat, Oct 20, 2012 at 5:05 AM, Michael Segel <[EMAIL PROTECTED]>wrote:

> Yeah, sorry...
> I meant that if you were dynamically creating a counter foo in the Mapper
> task, then each mapper would be creating their own counter foo.
> As the job runs, these counters will eventually be sent up to the JT. The
> job tracker would keep a separate counter for each task.
> At the end, the final count is aggregated from the list of counters for
> foo.
> I don't know how you can get a task to ask information from the Job
> Tracker on how things are going in other tasks.  That is what I meant that
> you couldn't get information about the other counters or even the status of
> the other tasks running in the same job.
> I didn't see anything in the APIs that allowed for that type of flow... Of
> course having said that... someone pops up with a way to do just that. ;-)
> Does that clarify things?
> -Mike
> On Oct 19, 2012, at 11:56 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
> Hi Mike,
> Sorry I am a bit lost... As you are thinking faster than me. :-P
> From your this statement "It would make sense that the JT maintains a
> unique counter for each task until the tasks complete." -- it seems each
> task cannot see counters from each other, since JT maintains a unique
> counter for each tasks;
> From your this comment "I meant that if a Task created and updated a
> counter, a different Task has access to that counter. " -- it seems
> different tasks could share/access the same counter.
> Appreciate if you could help to clarify a bit.
> regards,
> Lin
> On Sat, Oct 20, 2012 at 12:42 AM, Michael Segel <[EMAIL PROTECTED]
> > wrote:
>> On Oct 19, 2012, at 11:27 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
>> Hi Mike,
>> Thanks for the detailed reply. Two quick questions/comments,
>> 1. For "task", you mean a specific mapper instance, or a specific reducer
>> instance?
>> Either.
>> 2. "However, I do not believe that a separate Task could connect with the
>> JT and see if the counter exists or if it could get a value or even an
>> accurate value since the updates are asynchronous." -- do you mean if a
>> mapper is updating custom counter ABC, and another mapper is updating the
>> same customer counter ABC, their counter values are updated independently
>> by different mappers, and will not published (aggregated) externally until
>> job completed successfully?
>> I meant that if a Task created and updated a counter, a different Task
>> has access to that counter.
>> To give you an example, if I want to count the number of quality errors
>> and then fail after X number of errors, I can't use Global counters to do
>> this.
>> regards,
>> Lin
>> On Fri, Oct 19, 2012 at 10:35 PM, Michael Segel <
>> [EMAIL PROTECTED]> wrote:
>>> As I understand it... each Task has its own counters and are
>>> independently updated. As they report back to the JT, they update the
>>> counter(s)' status.
>>> The JT then will aggregate them.
>>> In terms of performance, Counters take up some memory in the JT so while
>>> its OK to use them, if you abuse them, you can run in to issues.
>>> As to limits... I guess that will depend on the amount of memory on the
>>> JT machine, the size of the cluster (Number of TT) and the number of