Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Slow shuffle stage?


Copy link to this message
-
Re: Slow shuffle stage?
Keith Wiley 2011-11-11, 15:52
892 nodes, 4 tasks each, 3:1 mapper/reducer ratio.  Each map task outputs four records, ~18MB each.  They are fairly evenly distributed to the 17 reducers.  As to the bandwidth of the cluster, I don't really know.  I'll look into that.

On Nov 10, 2011, at 7:07 PM, Prashant Sharma wrote:

> Can you tell us about your cluster, Is it single node? how big is your data
> then.? Or the bandwidth between nodes. (cause copy might take time in that
> case)
> -P
>
> On Fri, Nov 11, 2011 at 6:50 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:
>
>> What sorts of causes might be responsible for a long or slow shuffle
>> stage?  For example, I have a job of 266 maps (each emitting 4 records) and
>> 17 reduces (each ingesting about 60 records) that takes 72 minutes to
>> complete.  The maps tend to run in about 9-13 minutes (the value in
>> parentheses under the Finish Time column of the map task list in the job
>> tracker and the reduces run in about 37 minutes (same column).  If I click
>> into a specific reduce task, I see a Finish Time of 37 minutes of course,
>> and a Shuffle time of about 27 minutes.
>>
>> So, 11 minutes were spent in the maps, 10 in the reduces, and 27
>> shuffling.  Note that the 72 minute overall job time is considerably longer
>> than the sum of these three averages because of a few outlier maps (25
>> minutes, one even took 37 minutes) that held up the later stages).
>>
>> Disregarding the outliers, it's still spending more than 50% of the job
>> time (27 out of 48 minutes) shuffling instead of doing actual computation
>> in the maps and reducers.  This feels inefficient to me.
>>
>> What causes this and what can be done to improve it?
>>
>> Thanks.
________________________________________________________________________________
Keith Wiley     [EMAIL PROTECTED]     keithwiley.com    music.keithwiley.com

"What I primarily learned in grad school is how much I *don't* know.
Consequently, I left grad school with a higher ignorance to knowledge ratio than
when I entered."
                                           --  Keith Wiley
________________________________________________________________________________