-Re: Slow shuffle stage?
Keith Wiley 2011-11-11, 15:52
892 nodes, 4 tasks each, 3:1 mapper/reducer ratio. Each map task outputs four records, ~18MB each. They are fairly evenly distributed to the 17 reducers. As to the bandwidth of the cluster, I don't really know. I'll look into that.
On Nov 10, 2011, at 7:07 PM, Prashant Sharma wrote:
> Can you tell us about your cluster, Is it single node? how big is your data
> then.? Or the bandwidth between nodes. (cause copy might take time in that
> On Fri, Nov 11, 2011 at 6:50 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:
>> What sorts of causes might be responsible for a long or slow shuffle
>> stage? For example, I have a job of 266 maps (each emitting 4 records) and
>> 17 reduces (each ingesting about 60 records) that takes 72 minutes to
>> complete. The maps tend to run in about 9-13 minutes (the value in
>> parentheses under the Finish Time column of the map task list in the job
>> tracker and the reduces run in about 37 minutes (same column). If I click
>> into a specific reduce task, I see a Finish Time of 37 minutes of course,
>> and a Shuffle time of about 27 minutes.
>> So, 11 minutes were spent in the maps, 10 in the reduces, and 27
>> shuffling. Note that the 72 minute overall job time is considerably longer
>> than the sum of these three averages because of a few outlier maps (25
>> minutes, one even took 37 minutes) that held up the later stages).
>> Disregarding the outliers, it's still spending more than 50% of the job
>> time (27 out of 48 minutes) shuffling instead of doing actual computation
>> in the maps and reducers. This feels inefficient to me.
>> What causes this and what can be done to improve it?
Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com
"What I primarily learned in grad school is how much I *don't* know.
Consequently, I left grad school with a higher ignorance to knowledge ratio than
when I entered."
-- Keith Wiley