|
|
-
Weird performance problem.
GUOJUN Zhu 2012-03-16, 21:23
We have a weird performance problem with a hadoop job on our cluster. We have a 32-node experimenting cluster of blades (2 hex-core), one dedicated job tracker, one dedicated namenode, with Cloudera's CDH3 (0.20.2-cdh3u3, 03b655719d13929bd68bb2c2f9cee615b389cea9 ) . All nodes are bought together with the same kick-start script. All in Redhat 6.1 (Linux he3lxvd607 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux).
When we run the our job (~300 tasks), all tasks fire off at once, so averagely 10 tasks per node. We observe the higher-half of the nodes (node 17-32) have the average load close to 10, CPU is about 50% used. However, the lower-half (node 1-16) does not utilize the CPU fully, load is about 1-3, CPU is <10%. In the final metrics, the map task in the lower half has about the same "CPU time spent (ms) " count as the one in the higher half. So it is like that something throtles the tasks in the lower half (1-16). We checked the difference between the two sets of nodes in every aspects we can think of. No difference.
Our job uses the old mapred API. It has a quite modest input (<1G input for 300 maps) and very tiny output. The intermediate output from maps are larger (maybe 10x input). The slow part is actually within the map, when we try to convert the input format into some classes before we can do the real calculation.
We then physically switch the blades in 1-16 with the blades in 17-32. We still see the under-utilization in now 1-16. So it is more like some configuration in the hadoop or system.
We run out of ideas. Any suggestions are highly appreciated.
We run terasort or word-count, They seem to use all nodes the same.
Zhu, Guojun Modeling Sr Graduate 571-3824370 [EMAIL PROTECTED] Financial Engineering Freddie Mac
-
Re: Weird performance problem.
bejoy.hadoop@... 2012-03-16, 21:58
Hi Some ares to look on - is majority of your tasks data local? - how is the rack topology enabled? - Is the data uniformly distributed across nodes?
Regards Bejoy K S
From handheld, Please excuse typos.
-----Original Message----- From: GUOJUN Zhu <[EMAIL PROTECTED]> Date: Fri, 16 Mar 2012 17:23:09 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Weird performance problem.
We have a weird performance problem with a hadoop job on our cluster. We have a 32-node experimenting cluster of blades (2 hex-core), one dedicated job tracker, one dedicated namenode, with Cloudera's CDH3 (0.20.2-cdh3u3, 03b655719d13929bd68bb2c2f9cee615b389cea9 ) . All nodes are bought together with the same kick-start script. All in Redhat 6.1 (Linux he3lxvd607 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux).
When we run the our job (~300 tasks), all tasks fire off at once, so averagely 10 tasks per node. We observe the higher-half of the nodes (node 17-32) have the average load close to 10, CPU is about 50% used. However, the lower-half (node 1-16) does not utilize the CPU fully, load is about 1-3, CPU is <10%. In the final metrics, the map task in the lower half has about the same "CPU time spent (ms) " count as the one in the higher half. So it is like that something throtles the tasks in the lower half (1-16). We checked the difference between the two sets of nodes in every aspects we can think of. No difference.
Our job uses the old mapred API. It has a quite modest input (<1G input for 300 maps) and very tiny output. The intermediate output from maps are larger (maybe 10x input). The slow part is actually within the map, when we try to convert the input format into some classes before we can do the real calculation.
We then physically switch the blades in 1-16 with the blades in 17-32. We still see the under-utilization in now 1-16. So it is more like some configuration in the hadoop or system.
We run out of ideas. Any suggestions are highly appreciated.
We run terasort or word-count, They seem to use all nodes the same.
Zhu, Guojun Modeling Sr Graduate 571-3824370 [EMAIL PROTECTED] Financial Engineering Freddie Mac
-
Re: Weird performance problem.
Jie Li 2012-03-17, 00:46
Did you try using the Hadoop Vaidya or Karmasphere to diagnose the problem?
Jie
On Fri, Mar 16, 2012 at 5:23 PM, GUOJUN Zhu <[EMAIL PROTECTED]>wrote:
> > We have a weird performance problem with a hadoop job on our cluster. We > have a 32-node experimenting cluster of blades (2 hex-core), one dedicated > job tracker, one dedicated namenode, with Cloudera's CDH3 (0.20.2-cdh3u3, > 03b655719d13929bd68bb2c2f9cee615b389cea9 ) . All nodes are bought > together with the same kick-start script. All in Redhat 6.1 (Linux > he3lxvd607 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011 > x86_64 x86_64 x86_64 GNU/Linux). > > When we run the our job (~300 tasks), all tasks fire off at once, so > averagely 10 tasks per node. We observe the higher-half of the nodes > (node 17-32) have the average load close to 10, CPU is about 50% used. > However, the lower-half (node 1-16) does not utilize the CPU fully, load > is about 1-3, CPU is <10%. In the final metrics, the map task in the > lower half has about the same "CPU time spent (ms) " count as the one in > the higher half. So it is like that something throtles the tasks in the > lower half (1-16). We checked the difference between the two sets of nodes > in every aspects we can think of. No difference. > > Our job uses the old mapred API. It has a quite modest input (<1G input > for 300 maps) and very tiny output. The intermediate output from maps are > larger (maybe 10x input). The slow part is actually within the map, when we > try to convert the input format into some classes before we can do the real > calculation. > > We then physically switch the blades in 1-16 with the blades in 17-32. We > still see the under-utilization in now 1-16. So it is more like some > configuration in the hadoop or system. > > We run out of ideas. Any suggestions are highly appreciated. > > We run terasort or word-count, They seem to use all nodes the same. > > Zhu, Guojun > Modeling Sr Graduate > 571-3824370 > [EMAIL PROTECTED] > Financial Engineering > Freddie Mac
-
Re: Weird performance problem.
Vitthal \Suhas\ Gogate 2012-03-17, 01:01
Can you send me the job configuration and job history log for the job? I want to see if Vaidya discovers the problem. --Suhas
On Fri, Mar 16, 2012 at 2:23 PM, GUOJUN Zhu <[EMAIL PROTECTED]>wrote:
> > We have a weird performance problem with a hadoop job on our cluster. We > have a 32-node experimenting cluster of blades (2 hex-core), one dedicated > job tracker, one dedicated namenode, with Cloudera's CDH3 (0.20.2-cdh3u3, > 03b655719d13929bd68bb2c2f9cee615b389cea9 ) . All nodes are bought > together with the same kick-start script. All in Redhat 6.1 (Linux > he3lxvd607 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011 > x86_64 x86_64 x86_64 GNU/Linux). > > When we run the our job (~300 tasks), all tasks fire off at once, so > averagely 10 tasks per node. We observe the higher-half of the nodes > (node 17-32) have the average load close to 10, CPU is about 50% used. > However, the lower-half (node 1-16) does not utilize the CPU fully, load > is about 1-3, CPU is <10%. In the final metrics, the map task in the > lower half has about the same "CPU time spent (ms) " count as the one in > the higher half. So it is like that something throtles the tasks in the > lower half (1-16). We checked the difference between the two sets of nodes > in every aspects we can think of. No difference. > > Our job uses the old mapred API. It has a quite modest input (<1G input > for 300 maps) and very tiny output. The intermediate output from maps are > larger (maybe 10x input). The slow part is actually within the map, when we > try to convert the input format into some classes before we can do the real > calculation. > > We then physically switch the blades in 1-16 with the blades in 17-32. We > still see the under-utilization in now 1-16. So it is more like some > configuration in the hadoop or system. > > We run out of ideas. Any suggestions are highly appreciated. > > We run terasort or word-count, They seem to use all nodes the same. > > Zhu, Guojun > Modeling Sr Graduate > 571-3824370 > [EMAIL PROTECTED] > Financial Engineering > Freddie Mac
-
Re: Weird performance problem.
GUOJUN Zhu 2012-03-19, 13:56
Thank you very much. The job is mostly data local. Within 608 tasks, 427 are data-local map. One peculiar thing about it is that the inputsplit is smaller than a data block. Each inputsplit is about 4MB. The job has substantial computation to do though. But we only observed a short I/O burst and network burst (<1min) then the slow tasks run for a bit over 10 minutes, faster one (on good node) finishes about 4 minutes.
All nodes are in the same rack, connected with a 10G ethernet network.
P.S. I attached the job conf xml file. You are welcome to have a look.
Zhu, Guojun Modeling Sr Graduate 571-3824370 [EMAIL PROTECTED] Financial Engineering Freddie Mac
[EMAIL PROTECTED] 03/16/2012 05:58 PM Please respond to [EMAIL PROTECTED] To [EMAIL PROTECTED] cc
Subject Re: Weird performance problem. Hi Some ares to look on- is majority of your tasks data local? - how is the rack topology enabled?- Is the data uniformly distributed across nodes? RegardsBejoy K SFrom handheld, Please excuse typos. From: GUOJUN Zhu <[EMAIL PROTECTED]> Date: Fri, 16 Mar 2012 17:23:09 -0400 To: <[EMAIL PROTECTED]> ReplyTo: [EMAIL PROTECTED] Subject: Weird performance problem.
We have a weird performance problem with a hadoop job on our cluster. We have a 32-node experimenting cluster of blades (2 hex-core), one dedicated job tracker, one dedicated namenode, with Cloudera's CDH3 (0.20.2-cdh3u3, 03b655719d13929bd68bb2c2f9cee615b389cea9 ) . All nodes are bought together with the same kick-start script. All in Redhat 6.1 (Linux he3lxvd607 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux).
When we run the our job (~300 tasks), all tasks fire off at once, so averagely 10 tasks per node. We observe the higher-half of the nodes (node 17-32) have the average load close to 10, CPU is about 50% used. However, the lower-half (node 1-16) does not utilize the CPU fully, load is about 1-3, CPU is <10%. In the final metrics, the map task in the lower half has about the same "CPU time spent (ms) " count as the one in the higher half. So it is like that something throtles the tasks in the lower half (1-16). We checked the difference between the two sets of nodes in every aspects we can think of. No difference.
Our job uses the old mapred API. It has a quite modest input (<1G input for 300 maps) and very tiny output. The intermediate output from maps are larger (maybe 10x input). The slow part is actually within the map, when we try to convert the input format into some classes before we can do the real calculation.
We then physically switch the blades in 1-16 with the blades in 17-32. We still see the under-utilization in now 1-16. So it is more like some configuration in the hadoop or system.
We run out of ideas. Any suggestions are highly appreciated.
We run terasort or word-count, They seem to use all nodes the same.
Zhu, Guojun Modeling Sr Graduate 571-3824370 [EMAIL PROTECTED] Financial Engineering Freddie Mac
|
|