|
|
-
Re: Strange machine behavior
Robert Dyer 2012-12-11, 03:33
On Sun, Dec 9, 2012 at 5:45 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Hi, > > I always set "vm.swappiness = 0" for my hadoop servers (PostgreSQL > servers too). >
I have just done this for that machine. So far, I have not seen a re-occurrence of the strange behavior; it appears this might have solved the problem. > The reason is that Linux moves memory pages to swap space if they have not > been accessed for a period of time (swapping). Java virtual machine (JVM) > does not act well in the case of swapping that will make MapReduce (and > HBase and ZooKeeper) run into trouble. So I would suggest to set > vm.swappiness = 0. > > Thanks > ac > > On 9 Dec 2012, at 12:58 PM, seth wrote: > > > Oracle frequently recommends vm.swappiness = 0 to get well behaved RAC > nodes. Otherwise you start paging out things you don't usually want paged > out in favor of a larger filesystem cache. > > > > There is also a vm parameter that controls the minimum size of the free > chain, might want to increase that a bit. > > > > Also, look into hosting your JVM heap on huge pages, they can't be paged > out and will help the JVM perform better too. > > > > On Dec 8, 2012, at 6:09 PM, Robert Dyer <[EMAIL PROTECTED]> wrote: > > > >> Has anyone experienced a TaskTracker/DataNode behaving like the > attached image? > >> > >> This was during a MR job (which runs often). Note the extremely high > System CPU time. Upon investigating I saw that out of 64GB ram the system > had allocated almost 45GB to cache! > >> > >> I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" > which is roughly where the graph goes back to normal (much lower System, > much higher User). > >> > >> This has happened a few times. > >> > >> I have tried playing with the sysctl vm.swappiness value (default of > 60) by setting it to 30 (which it was at when the graph was collected) and > now to 10. I am not sure that helps. > >> > >> Any ideas? Anyone else run into this before? > >> > >> 24 cores > >> 64GB ram > >> 4x2TB sata3 hdd > >> > >> Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb > heap) on this machine. > >> > >> 24 map slots (1gb heap each), no reducers. > >> > >> Also running HBase 0.94.2 with a RS (8gb ram) on this machine. > >> <cpu-use.png> > > --
Robert Dyer [EMAIL PROTECTED]
+
Robert Dyer 2012-12-11, 03:33
-
Re: Strange machine behavior
Andy Isaacson 2012-12-10, 19:23
What kernel did you see this on? Was there significant swap traffic (si/so in vmstat output) during the high-system-time period?
BTW, you don't need to nor do you want to run sync(1) when manipulating drop_caches, it just causes additional noise and slowdown. drop_caches doesn't have any impact on correctness; it won't cause data loss (by dropping a dirty page or whatever). I've had sync calls take 10 minutes to complete, so the unnecessary impact can be significant.
-andy
On Sat, Dec 8, 2012 at 4:09 PM, Robert Dyer <[EMAIL PROTECTED]> wrote: > Has anyone experienced a TaskTracker/DataNode behaving like the attached > image? > > This was during a MR job (which runs often). Note the extremely high System > CPU time. Upon investigating I saw that out of 64GB ram the system had > allocated almost 45GB to cache! > > I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" which is > roughly where the graph goes back to normal (much lower System, much higher > User). > > This has happened a few times. > > I have tried playing with the sysctl vm.swappiness value (default of 60) by > setting it to 30 (which it was at when the graph was collected) and now to > 10. I am not sure that helps. > > Any ideas? Anyone else run into this before? > > 24 cores > 64GB ram > 4x2TB sata3 hdd > > Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) on > this machine. > > 24 map slots (1gb heap each), no reducers. > > Also running HBase 0.94.2 with a RS (8gb ram) on this machine.
+
Andy Isaacson 2012-12-10, 19:23
-
Re: Strange machine behavior
Robert Dyer 2012-12-11, 03:30
On Mon, Dec 10, 2012 at 1:23 PM, Andy Isaacson <[EMAIL PROTECTED]> wrote:
> What kernel did you see this on? Was there significant swap traffic > (si/so in vmstat output) during the high-system-time period? >
It's an older kernel, Fedora 15.
Linux XXXXX 2.6.43.8-1.fc15.x86_64 #1 SMP Mon Jun 4 20:33:44 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
The next time it happens I'll take a look at the vmstat output, I do not have that log for this last occurrence. > BTW, you don't need to nor do you want to run sync(1) when > manipulating drop_caches, it just causes additional noise and > slowdown. drop_caches doesn't have any impact on correctness; it won't > cause data loss (by dropping a dirty page or whatever). I've had sync > calls take 10 minutes to complete, so the unnecessary impact can be > significant. > > -andy > > On Sat, Dec 8, 2012 at 4:09 PM, Robert Dyer <[EMAIL PROTECTED]> wrote: > > Has anyone experienced a TaskTracker/DataNode behaving like the attached > > image? > > > > This was during a MR job (which runs often). Note the extremely high > System > > CPU time. Upon investigating I saw that out of 64GB ram the system had > > allocated almost 45GB to cache! > > > > I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" > which is > > roughly where the graph goes back to normal (much lower System, much > higher > > User). > > > > This has happened a few times. > > > > I have tried playing with the sysctl vm.swappiness value (default of 60) > by > > setting it to 30 (which it was at when the graph was collected) and now > to > > 10. I am not sure that helps. > > > > Any ideas? Anyone else run into this before? > > > > 24 cores > > 64GB ram > > 4x2TB sata3 hdd > > > > Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) > on > > this machine. > > > > 24 map slots (1gb heap each), no reducers. > > > > Also running HBase 0.94.2 with a RS (8gb ram) on this machine. >
--
Robert Dyer [EMAIL PROTECTED]
+
Robert Dyer 2012-12-11, 03:30
-
Re: Strange machine behavior
Bharath Mundlapudi 2012-12-11, 02:06
Are you seeing any performance impact with this cache increase? It is normal in linux system to grab high cache level. -Bharath ________________________________ From: Andy Isaacson <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Monday, December 10, 2012 11:23 AM Subject: Re: Strange machine behavior What kernel did you see this on? Was there significant swap traffic (si/so in vmstat output) during the high-system-time period?
BTW, you don't need to nor do you want to run sync(1) when manipulating drop_caches, it just causes additional noise and slowdown. drop_caches doesn't have any impact on correctness; it won't cause data loss (by dropping a dirty page or whatever). I've had sync calls take 10 minutes to complete, so the unnecessary impact can be significant.
-andy
On Sat, Dec 8, 2012 at 4:09 PM, Robert Dyer <[EMAIL PROTECTED]> wrote: > Has anyone experienced a TaskTracker/DataNode behaving like the attached > image? > > This was during a MR job (which runs often). Note the extremely high System > CPU time. Upon investigating I saw that out of 64GB ram the system had > allocated almost 45GB to cache! > > I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" which is > roughly where the graph goes back to normal (much lower System, much higher > User). > > This has happened a few times. > > I have tried playing with the sysctl vm.swappiness value (default of 60) by > setting it to 30 (which it was at when the graph was collected) and now to > 10. I am not sure that helps. > > Any ideas? Anyone else run into this before? > > 24 cores > 64GB ram > 4x2TB sata3 hdd > > Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) on > this machine. > > 24 map slots (1gb heap each), no reducers. > > Also running HBase 0.94.2 with a RS (8gb ram) on this machine.
+
Bharath Mundlapudi 2012-12-11, 02:06
-
Re: Strange machine behavior
Robert Dyer 2012-12-11, 03:32
Yes there is performance impact. It should be visible from the graph I attached. Basically, the CPU is spending much more time on System and the User time is lowered.
When this happens (if I don't do a drop_caches in time) the MR job winds up taking significantly longer than usual. On Mon, Dec 10, 2012 at 8:06 PM, Bharath Mundlapudi <[EMAIL PROTECTED]>wrote:
> Are you seeing any performance impact with this cache increase? It is > normal in linux system to grab high cache level. > > -Bharath > > ------------------------------ > *From:* Andy Isaacson <[EMAIL PROTECTED]> > *To:* [EMAIL PROTECTED] > *Sent:* Monday, December 10, 2012 11:23 AM > *Subject:* Re: Strange machine behavior > > What kernel did you see this on? Was there significant swap traffic > (si/so in vmstat output) during the high-system-time period? > > BTW, you don't need to nor do you want to run sync(1) when > manipulating drop_caches, it just causes additional noise and > slowdown. drop_caches doesn't have any impact on correctness; it won't > cause data loss (by dropping a dirty page or whatever). I've had sync > calls take 10 minutes to complete, so the unnecessary impact can be > significant. > > -andy > > On Sat, Dec 8, 2012 at 4:09 PM, Robert Dyer <[EMAIL PROTECTED]> wrote: > > Has anyone experienced a TaskTracker/DataNode behaving like the attached > > image? > > > > This was during a MR job (which runs often). Note the extremely high > System > > CPU time. Upon investigating I saw that out of 64GB ram the system had > > allocated almost 45GB to cache! > > > > I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" > which is > > roughly where the graph goes back to normal (much lower System, much > higher > > User). > > > > This has happened a few times. > > > > I have tried playing with the sysctl vm.swappiness value (default of 60) > by > > setting it to 30 (which it was at when the graph was collected) and now > to > > 10. I am not sure that helps. > > > > Any ideas? Anyone else run into this before? > > > > 24 cores > > 64GB ram > > 4x2TB sata3 hdd > > > > Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) > on > > this machine. > > > > 24 map slots (1gb heap each), no reducers. > > > > Also running HBase 0.94.2 with a RS (8gb ram) on this machine. > > > --
Robert Dyer [EMAIL PROTECTED]
+
Robert Dyer 2012-12-11, 03:32
|
|