Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Hadoop counter


Copy link to this message
-
Re: Hadoop counter
Thanks for the help so much, Mike. I learned a lot from this discussion.

So, the conclusion I learned from the discussion should be, since how/when
JT merge counter in the middle of the process of a job is undefined and
internal behavior, it is more reliable to read counter after the whole job
completes? Agree?

regards,
Lin

On Sun, Oct 21, 2012 at 8:15 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

>
> On Oct 21, 2012, at 1:45 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
>
> Thanks for the detailed reply, Mike. Yes, my most confusion is resolved by
> you. The last two questions (or comments) are used to confirm my
> understanding is correct,
>
> - is it normal use case or best practices for a job to consume/read the
> counters from previous completed job in an automatic way? I ask this
> because I am not sure whether the most use case of counter is human read
> and manual analysis, other then using another job to automatic consume the
> counters?
>
>
> Lin,
> Every job has a set of counters to maintain job statistics.
> This is specifically for human analysis and to help understand what
> happened with your job.
> It allows you to see how much data is read in by the job, how many records
> processed to be measured against how long the job took to complete.  It
> also showed you how much data is written back out.
>
> In addition to this,  a set of use cases for counters in Hadoop center on
> quality control. Its normal to chain jobs together to form a job flow.
> A typical use case for Hadoop is to pull data from various sources,
> combine them and do some process on them, resulting in a data set that gets
> sent to another system for visualization.
>
> In this use case, there are usually data cleansing and validation jobs. As
> they run, its possible to track a number of defective records. At the end
> of that specific job, from the ToolRunner, or whichever job class you used
> to launch your job, you can then get these aggregated counters for the job
> and determine if the process passed or failed.  Based on this, you can exit
> your program with either a success or failed flag.  Job Flow control tools
> like Oozie can capture this and then decide to continue or to stop and
> alert an operator of an error.
>
> - I want to confirm my understanding is correct, when each task completes,
> JT will aggregate/update the global counter values from the specific
> counter values updated by the complete task, but never expose global
> counters values until job completes? If it is correct, I am wondering why
> JT doing aggregation each time when a task completes, other than doing a
> one time aggregation when the job completes? Is there any design choice
> reasons? thanks.
>
>
> That's a good question. I haven't looked at the code, so I can't say
> definitively when the JT performs its aggregation. However, as the job runs
> and in process, we can look at the job tracker web page(s) and see the
> counter summary. This would imply that there has to be some aggregation
> occurring mid-flight. (It would be trivial to sum the list of counters
> periodically to update the job statistics.)  Note too that if the JT web
> pages can show a counter, its possible to then write a monitoring tool that
> can monitor the job while running and then kill the job mid flight if a
> certain threshold of a counter is met.
>
> That is to say you could in theory write a monitoring process and watch
> the counters. If lets say an error counter hits a predetermined threshold,
> you could then issue a 'hadoop job -kill <job-id>' command.
>
>
> regards,
> Lin
>
> On Sat, Oct 20, 2012 at 3:12 PM, Michael Segel <[EMAIL PROTECTED]>wrote:
>
>>
>> On Oct 19, 2012, at 10:27 PM, Lin Ma <[EMAIL PROTECTED]> wrote:
>>
>> Thanks for the detailed reply Mike, I learned a lot from the discussion.
>>
>> - I just want to confirm with you that, supposing in the same job, when a
>> specific task completed (and counter is aggregated in JT after the task
>> completed from our discussion?), the other running task in the same job