|
|
-
How do I diagnose a really slow copy
Steve Lewis 2011-11-04, 15:20
I have been finding a that my cluster is running abnormally slowly A typical reduce task reports reduce > copy (113 of 431 at 0.07 MB/s) 70 kb / second is a truely dreadful rate and tasks are running much slower under hadoop than the same code on a the same operations on a single box - Where do I look to find why IO operations might be so slow??
-- Steven M. Lewis PhD
-
Re: How do I diagnose a really slow copy
Harsh J 2011-11-04, 16:07
Steve,
The copy phase may start early, and the slow copy could also just be due to unavailability of completed map outputs at this stage. Does your question eliminate that case here?
I'd also check the network speeds you get between two slave nodes, and if your TaskTracker logs are indicating issues transferring map output requests via HTTP.
Also, do you run any form of network filtering stuff, firewalls, etc. that may be working at the packet levels? I've seen it cause slowdowns before, but am not too sure if that's the case here.
On 04-Nov-2011, at 8:50 PM, Steve Lewis wrote:
> I have been finding a that my cluster is running abnormally slowly > A typical reduce task reports > reduce > copy (113 of 431 at 0.07 MB/s) > 70 kb / second is a truely dreadful rate and tasks are running much slower under hadoop than the > same code on a the same operations on a single box - > Where do I look to find why IO operations might be so slow?? > > -- > Steven M. Lewis PhD > >
-
Re: How do I diagnose a really slow copy
Steve Lewis 2011-11-04, 19:26
The task has been running several hours and the map phase is essentially a null mapper - rewrite the key and value stored by an earlier reducer. There is no firewall - the entire job is running on an internal cluster - admitted launched from my local box on the company network - it is running WAY slower than jobs previously run on the same hardware and I suspect something is wring but lack the tools to even start diagnosing the issue
On Fri, Nov 4, 2011 at 9:07 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> Steve, > > The copy phase may start early, and the slow copy could also just be due > to unavailability of completed map outputs at this stage. Does your > question eliminate that case here? > > I'd also check the network speeds you get between two slave nodes, and if > your TaskTracker logs are indicating issues transferring map output > requests via HTTP. > > Also, do you run any form of network filtering stuff, firewalls, etc. that > may be working at the packet levels? I've seen it cause slowdowns before, > but am not too sure if that's the case here. > > On 04-Nov-2011, at 8:50 PM, Steve Lewis wrote: > > I have been finding a that my cluster is running abnormally slowly > A typical reduce task reports > reduce > copy (113 of 431 at 0.07 MB/s) > 70 kb / second is a truely dreadful rate and tasks are running much slower > under hadoop than the > same code on a the same operations on a single box - > Where do I look to find why IO operations might be so slow?? > > -- > Steven M. Lewis PhD > > > > -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext