-Re: Slow shuffle stage?
Keith Wiley 2011-11-11, 15:57
Oh, so the reported shuffle time incorporate the mapper time. I suppose that makes sense since the two stages overlap. That explains a lot.
As to the map times, it mostly due to task attempt failures. Any attempt that fails costs a lot of time. It is almost never a data-driven error, so subsequent attempts succeed, but it costs time when the earlier attempts fail. The cause is usually some cluster hairiness that I can neither control (I'm not the administrator) or barely comprehend: failure to retrieve blocks, random err 141 and 139 failures, that sort of thing.
I'll check on the speculative execution. I can't remember...I think so though.
On Nov 11, 2011, at 5:53 AM, Joey Echeverria wrote:
> Another thing to look at is the map outlier. The shuffle will start by default when 5% of the maps are done, but won't finish until after the last map is done. Since one of your maps took 37 minutes, your shuffle take at least that long.
> I would check the following:
> Is the input skewed?
> Does the same node always result in the 37 minute map task?
> Is speculative execution turned on?
> On Nov 10, 2011, at 22:07, Prashant Sharma <[EMAIL PROTECTED]> wrote:
>> Can you tell us about your cluster, Is it single node? how big is your data
>> then.? Or the bandwidth between nodes. (cause copy might take time in that
>> On Fri, Nov 11, 2011 at 6:50 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:
>>> What sorts of causes might be responsible for a long or slow shuffle
>>> stage? For example, I have a job of 266 maps (each emitting 4 records) and
>>> 17 reduces (each ingesting about 60 records) that takes 72 minutes to
>>> complete. The maps tend to run in about 9-13 minutes (the value in
>>> parentheses under the Finish Time column of the map task list in the job
>>> tracker and the reduces run in about 37 minutes (same column). If I click
>>> into a specific reduce task, I see a Finish Time of 37 minutes of course,
>>> and a Shuffle time of about 27 minutes.
>>> So, 11 minutes were spent in the maps, 10 in the reduces, and 27
>>> shuffling. Note that the 72 minute overall job time is considerably longer
>>> than the sum of these three averages because of a few outlier maps (25
>>> minutes, one even took 37 minutes) that held up the later stages).
>>> Disregarding the outliers, it's still spending more than 50% of the job
>>> time (27 out of 48 minutes) shuffling instead of doing actual computation
>>> in the maps and reducers. This feels inefficient to me.
>>> What causes this and what can be done to improve it?
Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com
"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
-- Keith Wiley