Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Slow shuffle stage?


Copy link to this message
-
Re: Slow shuffle stage?
Keith Wiley 2011-11-11, 15:57
Oh, so the reported shuffle time incorporate the mapper time.  I suppose that makes sense since the two stages overlap.  That explains a lot.

As to the map times, it mostly due to task attempt failures.  Any attempt that fails costs a lot of time.  It is almost never a data-driven error, so subsequent attempts succeed, but it costs time when the earlier attempts fail.  The cause is usually some cluster hairiness that I can neither control (I'm not the administrator) or barely comprehend: failure to retrieve blocks, random err 141 and 139 failures, that sort of thing.

I'll check on the speculative execution.  I can't remember...I think so though.

On Nov 11, 2011, at 5:53 AM, Joey Echeverria wrote:

> Another thing to look at is the map outlier. The shuffle will start by default when 5% of the maps are done, but won't finish until after the last map is done. Since one of your maps took 37 minutes, your shuffle take at least that long.
>
> I would check the following:
>
> Is the input skewed?
> Does the same node always result in the 37 minute map task?
> Is speculative execution turned on?
>
> -Joey
>
>
>
> On Nov 10, 2011, at 22:07, Prashant Sharma <[EMAIL PROTECTED]> wrote:
>
>> Can you tell us about your cluster, Is it single node? how big is your data
>> then.? Or the bandwidth between nodes. (cause copy might take time in that
>> case)
>> -P
>>
>> On Fri, Nov 11, 2011 at 6:50 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:
>>
>>> What sorts of causes might be responsible for a long or slow shuffle
>>> stage?  For example, I have a job of 266 maps (each emitting 4 records) and
>>> 17 reduces (each ingesting about 60 records) that takes 72 minutes to
>>> complete.  The maps tend to run in about 9-13 minutes (the value in
>>> parentheses under the Finish Time column of the map task list in the job
>>> tracker and the reduces run in about 37 minutes (same column).  If I click
>>> into a specific reduce task, I see a Finish Time of 37 minutes of course,
>>> and a Shuffle time of about 27 minutes.
>>>
>>> So, 11 minutes were spent in the maps, 10 in the reduces, and 27
>>> shuffling.  Note that the 72 minute overall job time is considerably longer
>>> than the sum of these three averages because of a few outlier maps (25
>>> minutes, one even took 37 minutes) that held up the later stages).
>>>
>>> Disregarding the outliers, it's still spending more than 50% of the job
>>> time (27 out of 48 minutes) shuffling instead of doing actual computation
>>> in the maps and reducers.  This feels inefficient to me.
>>>
>>> What causes this and what can be done to improve it?
>>>
>>> Thanks.
________________________________________________________________________________
Keith Wiley     [EMAIL PROTECTED]     keithwiley.com    music.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
                                           --  Keith Wiley
________________________________________________________________________________