Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Info required regarding JobTracker Job Details/Metrics


Copy link to this message
-
Re: Info required regarding JobTracker Job Details/Metrics
Gaurav Dasgupta 2012-08-23, 11:24
Sorry, the correct outcomes are for the single wordcount job are:

12/08/23 04:31:22 INFO mapred.JobClient: Job complete: job_201208230144_0002
12/08/23 04:31:22 INFO mapred.JobClient: Counters: 26
12/08/23 04:31:22 INFO mapred.JobClient:   Job Counters
12/08/23 04:31:22 INFO mapred.JobClient:     Launched reduce tasks=64
12/08/23 04:31:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=103718235
12/08/23 04:31:22 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
12/08/23 04:31:22 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
12/08/23 04:31:22 INFO mapred.JobClient:     Launched map tasks=3060
12/08/23 04:31:22 INFO mapred.JobClient:     Data-local map tasks=3060
12/08/23 04:31:22 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9208855
12/08/23 04:31:22 INFO mapred.JobClient:   FileSystemCounters
12/08/23 04:31:22 INFO mapred.JobClient:     FILE_BYTES_READ=58263069209
12/08/23 04:31:22 INFO mapred.JobClient:     HDFS_BYTES_READ=394195953674
12/08/23 04:31:22 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2046757548
12/08/23 04:31:22 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28095
12/08/23 04:31:22 INFO mapred.JobClient:   Map-Reduce Framework
12/08/23 04:31:22 INFO mapred.JobClient:     Map input records=586006142
12/08/23 04:31:22 INFO mapred.JobClient:     Reduce shuffle bytes=53567298
12/08/23 04:31:22 INFO mapred.JobClient:     Spilled Records=108996063
12/08/23 04:31:22 INFO mapred.JobClient:     Map output bytes=468042247685
12/08/23 04:31:22 INFO mapred.JobClient:     CPU time spent (ms)=91162220
12/08/23 04:31:22 INFO mapred.JobClient:     Total committed heap usage
(bytes)=981605744640
12/08/23 04:31:22 INFO mapred.JobClient:     Combine input
records=32046224559
12/08/23 04:31:22 INFO mapred.JobClient:     SPLIT_RAW_BYTES=382500
12/08/23 04:31:22 INFO mapred.JobClient:     Reduce input records=96063
12/08/23 04:31:22 INFO mapred.JobClient:     Reduce input groups=1000
12/08/23 04:31:22 INFO mapred.JobClient:     Combine output
records=108902950
12/08/23 04:31:22 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=1147705057280
12/08/23 04:31:22 INFO mapred.JobClient:     Reduce output records=1000
12/08/23 04:31:22 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=3221902118912
12/08/23 04:31:22 INFO mapred.JobClient:     Map output records=31937417672
Thanks,
Gaurav Dasgupta
On Thu, Aug 23, 2012 at 4:28 PM, Gaurav Dasgupta <[EMAIL PROTECTED]> wrote:

> Hi Users,
>
> I have run a wordount job on a Hadoop 0.20 cluster and the JobTracker Web
> UI gave me the following information after the successful completion of the
> job:
>
> *Job Counters*
> SLOTS_MILLIS_MAPS=5739
> Total time spent by all reduces waiting after reserving slots (ms)=0
> Total time spent by all maps waiting after reserving slots (ms)=0
> Launched map tasks=2
> SLOTS_MILLIS_REDUCES=0
> **
> *FileSystemCounters*
> HDFS_BYTES_READ=158
> FILE_BYTES_WRITTEN=97422
> HDFS_BYTES_WRITTEN=10000
> *Map-Reduce Framework*
> Map input records=586006142
> Reduce shuffle bytes=53567298
> Spilled Records=108996063
> Map output bytes=468042247685
> CPU time spent (ms)=91162220
> Total committed heap usage (bytes)=981605744640
> Combine input records=32046224559
> SPLIT_RAW_BYTES=382500
> Reduce input records=96063
> Reduce input groups=1000
> Combine output records=108902950
> Physical memory (bytes) snapshot=1147705057280
> Reduce output records=1000
> Virtual memory (bytes) snapshot=3221902118912
> Map output records=31937417672
>
> Can some one explain me all these above metrics? I mainly want to know the
> "total shuffled bytes" of the jobs. Is is "Reduce shuffle bytes"? Also, how
> can I calculate the "total shuffle time taken"?
> Also, which of the above are the "Map Input Size", "Reduce Input Size" and
> "Reduce Output Size"?
> I also want to know what is the difference between "FILE_BYTES_WRITTEN and
> HDFS_BYTES_WRITTEN. What is it writing outside HDFS which is bigger in size