Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - RE: issue about Shuffled Maps in MR job summary


Copy link to this message
-
RE: issue about Shuffled Maps in MR job summary
java8964 2013-12-12, 15:16
Or you should check  your job history UI, which provide the similar information as job tracker, as you are using MR2 and Yarn.
The default port of job history UI is 19888.

From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: RE: issue about Shuffled Maps in MR job summary
Date: Thu, 12 Dec 2013 10:06:37 -0500
Then you can check your job's status from the yarn resource manager web ui, to identify what step your reducers are in.

Date: Thu, 12 Dec 2013 11:12:47 +0800
Subject: Re: issue about Shuffled Maps in MR job summary
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]

one of important things is my input file is very small ,each file less than 10M,and i have a huge number of files
On Thu, Dec 12, 2013 at 9:58 AM, java8964 <[EMAIL PROTECTED]> wrote:

Assume the block size is 128M, and your mapper each finishes within half minute, then there is not too much logic in your mapper, as it can finish processing 128M around 30 seconds. If your reducers cannot finish with 1 week, then something is wrong.
So you may need to find out following:
1) How many mappers generated in your MR job?
2) Are they all finished? (Check them in the jobtracker through web or command line)
3) How many reducers in this job?
4) Are reducers starting? What stage are they in? Copying/Sorting/Reducing?
5) If in the reducing stage, check the userlog of reducers. Is your code running now?
All these information you can find out from the Job Tracker web UI.
Yong
Date: Thu, 12 Dec 2013 09:03:29 +0800
Subject: Re: issue about Shuffled Maps in MR job summary
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]

hi,
    suppose i have 5-worknode cluster,each worknode can allocate 40G mem ,and i do not care map task,be cause the map task in my job finished within half a minuter,as my observe the real slow task is reduce, i allocate 12G to each reduce task,so each worknode can support 3 reduce parallel,and the whole cluster can support 15 reducer,and i run the job with all 15 reducer, and i do not know if i increase reducer number from 15 to 30 ,each reduce allocate 6G MEM,that will speed the job or not ,the job run on my product env, it run nearly 1 week,it still not finished

On Wed, Dec 11, 2013 at 9:50 PM, java8964 <[EMAIL PROTECTED]> wrote:

The whole job complete time depends on a lot of factors. Are you sure the reducers part is the bottleneck?
Also, it also depends on how many Reducer input groups it has in your MR job. If you only have 20 reducer groups, even you jump your reducer count to 40, then the epoch of reducers part won't have too much change, as the additional 20 reducer task won't get data to process.

If you have a lot of reducer input groups, and your cluster does have capacity at this time, and your also have a lot idle reducer slot, then increase your reducer count should decrease your whole job complete time.

Make sense?
Yong
Date: Wed, 11 Dec 2013 14:20:24 +0800
Subject: Re: issue about Shuffled Maps in MR job summary
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
i read the doc, and find if i have 8 reducer ,a map task will output 8 partition ,each partition will be send to a different reducer, so if i increase reduce number ,the partition number increase ,but the volume on network traffic is same,why sometime ,increase reducer number will not decrease job complete time ?

 
On Wed, Dec 11, 2013 at 1:48 PM, Vinayakumar B <[EMAIL PROTECTED]> wrote:

It looks simple, J

 
Shuffled Maps= Number of Map Tasks * Number of Reducers

 
Thanks and Regards,

Vinayakumar B
 
From: ch huang [mailto:[EMAIL PROTECTED]]

Sent: 11 December 2013 10:56
To: [EMAIL PROTECTED]
Subject: issue about Shuffled Maps in MR job summary

 

hi,maillist:

           i run terasort with 16 reducers and 8 reducers,when i double reducer number, the Shuffled maps is also double ,my question is the job only run 20 map tasks (total input file is 10,and each file is 100M,my block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers run and 320 maps in 16 reducers run?how to caculate the shuffle maps number?
 

16 reducer summary output:

 

    

 Shuffled Maps =320

 
8 reducer summary output:

  

Shuffled Maps =160