Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Hadoop counter


Copy link to this message
-
Re: Hadoop counter
Thanks for the detailed reply, Mike. Yes, my most confusion is resolved by
you. The last two questions (or comments) are used to confirm my
understanding is correct,

- is it normal use case or best practices for a job to consume/read the
counters from previous completed job in an automatic way? I ask this
because I am not sure whether the most use case of counter is human read
and manual analysis, other then using another job to automatic consume the
counters?
- I want to confirm my understanding is correct, when each task completes,
JT will aggregate/update the global counter values from the specific
counter values updated by the complete task, but never expose global
counters values until job completes? If it is correct, I am wondering why
JT doing aggregation each time when a task completes, other than doing a
one time aggregation when the job completes? Is there any design choice
reasons? thanks.

regards,
Lin

On Sat, Oct 20, 2012 at 3:12 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

>
> On Oct 19, 2012, at 10:27 PM, Lin Ma <[EMAIL PROTECTED]> wrote:
>
> Thanks for the detailed reply Mike, I learned a lot from the discussion.
>
> - I just want to confirm with you that, supposing in the same job, when a
> specific task completed (and counter is aggregated in JT after the task
> completed from our discussion?), the other running task in the same job
> cannot get the updated counter value from the previous completed task? I am
> asking this because I am thinking whether I can use counter to share a
> global value between tasks.
>
>
> Yes that is correct.
> While I haven't looked at YARN (M/R 2.0) , M/R 1.x doesn't have an easy
> way for a task to query the job tracker. This might have changed in YARN
>
> - If so, what is the traditional use case of counter, only use counter
> values after the whole job completes?
>
> Yes the counters are used to provide data at the end of the job...
>
> BTW: appreciate if you could share me a few use cases from your experience
> about how counters are used.
>
> Well you have your typical job data like the number of records processed,
> total number of bytes read,  bytes written...
>
> But suppose you wanted to do some quality control on your input.
> So you need to keep a track on the count of bad records.  If this job is
> part of a process, you may want to include business logic in your job to
> halt the job flow if X% of the records contain bad data.
>
> Or your process takes input records and in processing them, they sort the
> records based on some characteristic and you want to count those sorted
> records as you processed them.
>
> For a more concrete example, the Illinois Tollway has these 'fast pass'
> lanes where cars equipped with RFID tags can have the tolls automatically
> deducted from their accounts rather than pay the toll manually each time.
>
> Suppose we wanted to determine how many cars in the 'Fast Pass' lanes are
> cheaters where they drive through the sensor and the sensor doesn't capture
> the RFID tag. (Note its possible that you have a false positive where the
> car has an RFID chip but doesn't trip the sensor.) Pushing the data in a
> map/reduce job would require the use of counters.
>
> Does that help?
>
> -Mike
>
> regards,
> Lin
>
> On Sat, Oct 20, 2012 at 5:05 AM, Michael Segel <[EMAIL PROTECTED]>wrote:
>
>> Yeah, sorry...
>>
>> I meant that if you were dynamically creating a counter foo in the Mapper
>> task, then each mapper would be creating their own counter foo.
>> As the job runs, these counters will eventually be sent up to the JT. The
>> job tracker would keep a separate counter for each task.
>>
>> At the end, the final count is aggregated from the list of counters for
>> foo.
>>
>>
>> I don't know how you can get a task to ask information from the Job
>> Tracker on how things are going in other tasks.  That is what I meant that
>> you couldn't get information about the other counters or even the status of
>> the other tasks running in the same job.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB