Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: Fw: Problems about the job counters


Copy link to this message
-
Re: Fw: Problems about the job counters
hi Hailong

       An important phase between map and reduce task is 'Shuffle'. In map
task part, all the output records filled to in-memory buffer and then spill
to disk as local spill file (temporary file), if the map task took huge
amount output and several local spill file. It necessary to 'merge' those
spill files to single target file in map side.
So, map task *read *local spill file content from disk to memory and going
to merging records to disk again(FILE_BYTES_READ in map side means the merge
phase between spill files in disk and memory, also FILE_BYTES_WITTERN is
total bytes that spilled to disk).

      HDFS_BYTES_READ only represents the map input bytes from HDFS.

     Referenced blogs of mine to explains 'Shuffle' phase in Chinese.
     http://langyu.iteye.com/blog/992916

--Regards
Denny Ye

2011/6/15 hailong.yang1115 <[EMAIL PROTECTED]>

> **
>
> Sorry for sending this email again but I got no answers from the first one.
> Anyone please help or forward it to mail-list that would help.
>
> 2011-06-15
>  ------------------------------
>   ***********************************************
> * Hailong Yang, PhD. Candidate
> * Sino-German Joint Software Institute,
> * School of Computer Science&Engineering, Beihang University
> * Phone: (86-010)82315908
> * Email: [EMAIL PROTECTED]
> * Address: G413, New Main Building in Beihang University,
> *              No.37 XueYuan Road,HaiDian District,
> *              Beijing,P.R.China,100191
> ***********************************************
> ------------------------------
> *发件人:* hailong.yang1115
> *发送时间:* 2011-06-10 13:28:46
> *收件人:* general
> *抄送:*
> *主题:* Problems about the job counters
>
>  Dear all,
>
> I am trying to the built-in example wordcount with nearly 15GB input. When
> the Hadoop job finished, I got the following counters.
>
>
> Counter Map Reduce Total Job Counters Launched reduce tasks 0 0 1 Rack-local
> map tasks 0 0 35 Launched map tasks 0 0 2,318 Data-local map tasks 0 0
> 2,283 FileSystemCounters FILE_BYTES_READ 22,863,580,656 17,654,943,341
> 40,518,523,997 HDFS_BYTES_READ 154,400,997,459 0 154,400,997,459
> FILE_BYTES_WRITTEN 33,490,829,403 17,654,943,341 51,145,772,744
> HDFS_BYTES_WRITTEN 0 2,747,356,704 2,747,356,704
>
> My question is what does the FILE_BYTES_READ counter mean? And what is the
> difference between FILE_BYTES_READ and HDFS_BYTES_READ? In my opinion, all
> the input is located in HDFS, so where does FILE_BYTES_READ come from during
> the map phase?
>
>
> Any help will be appreciated!
>
> Hailong
>
> 2011-06-10
> ------------------------------
>  ***********************************************
> * Hailong Yang, PhD. Candidate
> * Sino-German Joint Software Institute,
> * School of Computer Science&Engineering, Beihang University
> * Phone: (86-010)82315908
> * Email: [EMAIL PROTECTED]
> * Address: G413, New Main Building in Beihang University,
> *              No.37 XueYuan Road,HaiDian District,
> *              Beijing,P.R.China,100191
> ***********************************************
>
+
hailong.yang1115 2011-06-29, 07:19