Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Re: Hadoop counter


Copy link to this message
-
Re: Hadoop counter
Michael Segel 2012-10-23, 03:57
Yup.
The counters at the end of the job are the most accurate.

On Oct 22, 2012, at 3:00 AM, Lin Ma <[EMAIL PROTECTED]> wrote:

> Thanks for the help so much, Mike. I learned a lot from this discussion.
>
> So, the conclusion I learned from the discussion should be, since how/when JT merge counter in the middle of the process of a job is undefined and internal behavior, it is more reliable to read counter after the whole job completes? Agree?
>
> regards,
> Lin
>
> On Sun, Oct 21, 2012 at 8:15 PM, Michael Segel <[EMAIL PROTECTED]> wrote:
>
> On Oct 21, 2012, at 1:45 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
>
>> Thanks for the detailed reply, Mike. Yes, my most confusion is resolved by you. The last two questions (or comments) are used to confirm my understanding is correct,
>>
>> - is it normal use case or best practices for a job to consume/read the counters from previous completed job in an automatic way? I ask this because I am not sure whether the most use case of counter is human read and manual analysis, other then using another job to automatic consume the counters?
>
> Lin,
> Every job has a set of counters to maintain job statistics.
> This is specifically for human analysis and to help understand what happened with your job.
> It allows you to see how much data is read in by the job, how many records processed to be measured against how long the job took to complete.  It also showed you how much data is written back out.  
>
> In addition to this,  a set of use cases for counters in Hadoop center on quality control. Its normal to chain jobs together to form a job flow.
> A typical use case for Hadoop is to pull data from various sources, combine them and do some process on them, resulting in a data set that gets sent to another system for visualization.
>
> In this use case, there are usually data cleansing and validation jobs. As they run, its possible to track a number of defective records. At the end of that specific job, from the ToolRunner, or whichever job class you used to launch your job, you can then get these aggregated counters for the job and determine if the process passed or failed.  Based on this, you can exit your program with either a success or failed flag.  Job Flow control tools like Oozie can capture this and then decide to continue or to stop and alert an operator of an error.
>
>> - I want to confirm my understanding is correct, when each task completes, JT will aggregate/update the global counter values from the specific counter values updated by the complete task, but never expose global counters values until job completes? If it is correct, I am wondering why JT doing aggregation each time when a task completes, other than doing a one time aggregation when the job completes? Is there any design choice reasons? thanks.
>
> That's a good question. I haven't looked at the code, so I can't say definitively when the JT performs its aggregation. However, as the job runs and in process, we can look at the job tracker web page(s) and see the counter summary. This would imply that there has to be some aggregation occurring mid-flight. (It would be trivial to sum the list of counters periodically to update the job statistics.)  Note too that if the JT web pages can show a counter, its possible to then write a monitoring tool that can monitor the job while running and then kill the job mid flight if a certain threshold of a counter is met.
>
> That is to say you could in theory write a monitoring process and watch the counters. If lets say an error counter hits a predetermined threshold, you could then issue a 'hadoop job -kill <job-id>' command.
>
>>
>> regards,
>> Lin
>>
>> On Sat, Oct 20, 2012 at 3:12 PM, Michael Segel <[EMAIL PROTECTED]> wrote:
>>
>> On Oct 19, 2012, at 10:27 PM, Lin Ma <[EMAIL PROTECTED]> wrote:
>>
>>> Thanks for the detailed reply Mike, I learned a lot from the discussion.
>>>
>>> - I just want to confirm with you that, supposing in the same job, when a specific task completed (and counter is aggregated in JT after the task completed from our discussion?), the other running task in the same job cannot get the updated counter value from the previous completed task? I am asking this because I am thinking whether I can use counter to share a global value between tasks.