|
|
-
Re: Info required regarding JobTracker Job Details/MetricsGaurav Dasgupta 2012-08-23, 12:06
Hi,
Thanks for your replies. Any idea how do I calculate the "total shuffle time"? I can get and calculate the total time taken by all the Mappers and all the Reducers separatey, but the intermediate shuffle/sort time is absent. Any clue? Thanks, Gaurav Dasgupta On Thu, Aug 23, 2012 at 5:26 PM, Sonal Goyal <[EMAIL PROTECTED]> wrote: > Gaurav, > > You can also refer to Tom White's Hadoop, The Definitive Guide, Chapter 8 > which has a reference to each of the job counters. I believe the Apache > site also had a page detailing the counters, but I cant seem to locate it. > > Best Regards, > Sonal > Crux: Reporting for HBase <https://github.com/sonalgoyal/crux> > Nube Technologies <http://www.nubetech.co/> > > <http://in.linkedin.com/in/sonalgoyal> > > > > > > > On Thu, Aug 23, 2012 at 5:20 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote: > >> Hi Gaurav >> >> If it is just a simple word count example. >> Map input size = HDFS_BYTES_READ >> Reduce Output Size = HDFS_BYTES_WRITTEN >> Reduce Input Size should be Map output bytes >> >> File Bytes Written is what the job is writing into local file system. >> AFAIK it is map task's intermediate output written to LFS. >> >> >> Regrads >> Bejoy KS >> >> >> On Thu, Aug 23, 2012 at 4:54 PM, Gaurav Dasgupta <[EMAIL PROTECTED]>wrote: >> >>> Sorry, the correct outcomes are for the single wordcount job are: >>> >>> 12/08/23 04:31:22 INFO mapred.JobClient: Job complete: >>> job_201208230144_0002 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Counters: 26 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Job Counters >>> 12/08/23 04:31:22 INFO mapred.JobClient: Launched reduce tasks=64 >>> 12/08/23 04:31:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=103718235 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Total time spent by all >>> reduces waiting after reserving slots (ms)=0 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Total time spent by all >>> maps waiting after reserving slots (ms)=0 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Launched map tasks=3060 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Data-local map tasks=3060 >>> 12/08/23 04:31:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9208855 >>> 12/08/23 04:31:22 INFO mapred.JobClient: FileSystemCounters >>> 12/08/23 04:31:22 INFO mapred.JobClient: FILE_BYTES_READ=58263069209 >>> 12/08/23 04:31:22 INFO mapred.JobClient: HDFS_BYTES_READ=394195953674 >>> 12/08/23 04:31:22 INFO mapred.JobClient: >>> FILE_BYTES_WRITTEN=2046757548 >>> 12/08/23 04:31:22 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28095 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Map-Reduce Framework >>> 12/08/23 04:31:22 INFO mapred.JobClient: Map input records=586006142 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Reduce shuffle >>> bytes=53567298 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Spilled Records=108996063 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Map output >>> bytes=468042247685 >>> 12/08/23 04:31:22 INFO mapred.JobClient: CPU time spent (ms)=91162220 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Total committed heap usage >>> (bytes)=981605744640 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Combine input >>> records=32046224559 >>> 12/08/23 04:31:22 INFO mapred.JobClient: SPLIT_RAW_BYTES=382500 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Reduce input records=96063 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Reduce input groups=1000 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Combine output >>> records=108902950 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Physical memory (bytes) >>> snapshot=1147705057280 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Reduce output records=1000 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Virtual memory (bytes) >>> snapshot=3221902118912 >>> 12/08/23 04:31:22 INFO mapred.JobClient: Map output >>> records=31937417672 >>> >>> >>> Thanks, >>> Gaurav Dasgupta >>> On Thu, Aug 23, 2012 at 4:28 PM, Gaurav Dasgupta <[EMAIL PROTECTED] |