Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Info required regarding JobTracker Job Details/Metrics


Copy link to this message
-
Re: Info required regarding JobTracker Job Details/Metrics
Hi,

Thanks for your replies.
Any idea how do I calculate the "total shuffle time"?
I can get and calculate the total time taken by all the Mappers and all the
Reducers separatey, but the intermediate shuffle/sort time is absent. Any
clue?

Thanks,
Gaurav Dasgupta
On Thu, Aug 23, 2012 at 5:26 PM, Sonal Goyal <[EMAIL PROTECTED]> wrote:

> Gaurav,
>
> You can also refer to Tom White's Hadoop, The Definitive Guide, Chapter 8
> which has a reference to each of the job counters. I believe the Apache
> site also had a page detailing the counters, but I cant seem to locate it.
>
> Best Regards,
> Sonal
> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> Nube Technologies <http://www.nubetech.co/>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
>
> On Thu, Aug 23, 2012 at 5:20 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:
>
>> Hi Gaurav
>>
>> If it is just a simple word count example.
>> Map input size =  HDFS_BYTES_READ
>> Reduce Output Size =  HDFS_BYTES_WRITTEN
>> Reduce Input Size should be Map output bytes
>>
>> File Bytes Written is what the job is writing into local file system.
>> AFAIK it is map task's intermediate output written to LFS.
>>
>>
>> Regrads
>> Bejoy KS
>>
>>
>> On Thu, Aug 23, 2012 at 4:54 PM, Gaurav Dasgupta <[EMAIL PROTECTED]>wrote:
>>
>>> Sorry, the correct outcomes are for the single wordcount job are:
>>>
>>> 12/08/23 04:31:22 INFO mapred.JobClient: Job complete:
>>> job_201208230144_0002
>>> 12/08/23 04:31:22 INFO mapred.JobClient: Counters: 26
>>> 12/08/23 04:31:22 INFO mapred.JobClient:   Job Counters
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Launched reduce tasks=64
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=103718235
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Total time spent by all
>>> reduces waiting after reserving slots (ms)=0
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Total time spent by all
>>> maps waiting after reserving slots (ms)=0
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Launched map tasks=3060
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Data-local map tasks=3060
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9208855
>>> 12/08/23 04:31:22 INFO mapred.JobClient:   FileSystemCounters
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     FILE_BYTES_READ=58263069209
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     HDFS_BYTES_READ=394195953674
>>> 12/08/23 04:31:22 INFO mapred.JobClient:
>>> FILE_BYTES_WRITTEN=2046757548
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28095
>>> 12/08/23 04:31:22 INFO mapred.JobClient:   Map-Reduce Framework
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Map input records=586006142
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce shuffle
>>> bytes=53567298
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Spilled Records=108996063
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Map output
>>> bytes=468042247685
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     CPU time spent (ms)=91162220
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Total committed heap usage
>>> (bytes)=981605744640
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Combine input
>>> records=32046224559
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     SPLIT_RAW_BYTES=382500
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce input records=96063
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce input groups=1000
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Combine output
>>> records=108902950
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Physical memory (bytes)
>>> snapshot=1147705057280
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce output records=1000
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Virtual memory (bytes)
>>> snapshot=3221902118912
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Map output
>>> records=31937417672
>>>
>>>
>>> Thanks,
>>> Gaurav Dasgupta
>>>  On Thu, Aug 23, 2012 at 4:28 PM, Gaurav Dasgupta <[EMAIL PROTECTED]