Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> RE: issue about Shuffled Maps in MR job summary


Copy link to this message
-
Re: issue about Shuffled Maps in MR job summary
one of important things is my input file is very small ,each file less than
10M,and i have a huge number of files

On Thu, Dec 12, 2013 at 9:58 AM, java8964 <[EMAIL PROTECTED]> wrote:

>  Assume the block size is 128M, and your mapper each finishes within half
> minute, then there is not too much logic in your mapper, as it can finish
> processing 128M around 30 seconds. If your reducers cannot finish with 1
> week, then something is wrong.
>
> So you may need to find out following:
>
> 1) How many mappers generated in your MR job?
> 2) Are they all finished? (Check them in the jobtracker through web or
> command line)
> 3) How many reducers in this job?
> 4) Are reducers starting? What stage are they in? Copying/Sorting/Reducing?
> 5) If in the reducing stage, check the userlog of reducers. Is your code
> running now?
>
> All these information you can find out from the Job Tracker web UI.
>
> Yong
>
>  ------------------------------
> Date: Thu, 12 Dec 2013 09:03:29 +0800
>
> Subject: Re: issue about Shuffled Maps in MR job summary
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
>
> hi,
>     suppose i have 5-worknode cluster,each worknode can allocate 40G mem
> ,and i do not care map task,be cause the map task in my job finished within
> half a minuter,as my observe the real slow task is reduce, i allocate 12G
> to each reduce task,so each worknode can support 3 reduce parallel,and the
> whole cluster can support 15 reducer,and i run the job with all 15 reducer,
> and i do not know if i increase reducer number from 15 to 30 ,each reduce
> allocate 6G MEM,that will speed the job or not ,the job run on my product
> env, it run nearly 1 week,it still not finished
>
> On Wed, Dec 11, 2013 at 9:50 PM, java8964 <[EMAIL PROTECTED]> wrote:
>
>  The whole job complete time depends on a lot of factors. Are you sure
> the reducers part is the bottleneck?
>
> Also, it also depends on how many Reducer input groups it has in your MR
> job. If you only have 20 reducer groups, even you jump your reducer count
> to 40, then the epoch of reducers part won't have too much change, as the
> additional 20 reducer task won't get data to process.
>
> If you have a lot of reducer input groups, and your cluster does have
> capacity at this time, and your also have a lot idle reducer slot, then
> increase your reducer count should decrease your whole job complete time.
>
> Make sense?
>
> Yong
>
>  ------------------------------
> Date: Wed, 11 Dec 2013 14:20:24 +0800
> Subject: Re: issue about Shuffled Maps in MR job summary
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
>
>
> i read the doc, and find if i have 8 reducer ,a map task will output 8
> partition ,each partition will be send to a different reducer, so if i
> increase reduce number ,the partition number increase ,but the volume on
> network traffic is same,why sometime ,increase reducer number will not
> decrease job complete time ?
>
> On Wed, Dec 11, 2013 at 1:48 PM, Vinayakumar B <[EMAIL PROTECTED]>wrote:
>
>  It looks simple, J
>
> Shuffled Maps= Number of Map Tasks * Number of Reducers
>
> Thanks and Regards,
> Vinayakumar B
>
> *From:* ch huang [mailto:[EMAIL PROTECTED]]
> *Sent:* 11 December 2013 10:56
> *To:* [EMAIL PROTECTED]
> *Subject:* issue about Shuffled Maps in MR job summary
>
> hi,maillist:
>            i run terasort with 16 reducers and 8 reducers,when i double
> reducer number, the Shuffled maps is also double ,my question is the job
> only run 20 map tasks (total input file is 10,and each file is 100M,my
> block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers
> run and 320 maps in 16 reducers run?how to caculate the shuffle maps number?
>
> 16 reducer summary output:
>
>
>  Shuffled Maps =320
>
>  8 reducer summary output:
>
> Shuffled Maps =160
>
>
>
>