You can enable speculative execution on a per-job basis, assuming your
administrator hasn't marked it final. The parameter you want is
On Fri, Nov 11, 2011 at 10:57 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:
> Oh, so the reported shuffle time incorporate the mapper time. I suppose that makes sense since the two stages overlap. That explains a lot.
> As to the map times, it mostly due to task attempt failures. Any attempt that fails costs a lot of time. It is almost never a data-driven error, so subsequent attempts succeed, but it costs time when the earlier attempts fail. The cause is usually some cluster hairiness that I can neither control (I'm not the administrator) or barely comprehend: failure to retrieve blocks, random err 141 and 139 failures, that sort of thing.
> I'll check on the speculative execution. I can't remember...I think so though.
> On Nov 11, 2011, at 5:53 AM, Joey Echeverria wrote:
>> Another thing to look at is the map outlier. The shuffle will start by default when 5% of the maps are done, but won't finish until after the last map is done. Since one of your maps took 37 minutes, your shuffle take at least that long.
>> I would check the following:
>> Is the input skewed?
>> Does the same node always result in the 37 minute map task?
>> Is speculative execution turned on?
>> On Nov 10, 2011, at 22:07, Prashant Sharma <[EMAIL PROTECTED]> wrote:
>>> Can you tell us about your cluster, Is it single node? how big is your data
>>> then.? Or the bandwidth between nodes. (cause copy might take time in that
>>> On Fri, Nov 11, 2011 at 6:50 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:
>>>> What sorts of causes might be responsible for a long or slow shuffle
>>>> stage? For example, I have a job of 266 maps (each emitting 4 records) and
>>>> 17 reduces (each ingesting about 60 records) that takes 72 minutes to
>>>> complete. The maps tend to run in about 9-13 minutes (the value in
>>>> parentheses under the Finish Time column of the map task list in the job
>>>> tracker and the reduces run in about 37 minutes (same column). If I click
>>>> into a specific reduce task, I see a Finish Time of 37 minutes of course,
>>>> and a Shuffle time of about 27 minutes.
>>>> So, 11 minutes were spent in the maps, 10 in the reduces, and 27
>>>> shuffling. Note that the 72 minute overall job time is considerably longer
>>>> than the sum of these three averages because of a few outlier maps (25
>>>> minutes, one even took 37 minutes) that held up the later stages).
>>>> Disregarding the outliers, it's still spending more than 50% of the job
>>>> time (27 out of 48 minutes) shuffling instead of doing actual computation
>>>> in the maps and reducers. This feels inefficient to me.
>>>> What causes this and what can be done to improve it?
> Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com
> "It's a fine line between meticulous and obsessive-compulsive and a slippery
> rope between obsessive-compulsive and debilitatingly slow."
> -- Keith Wiley