|
Jon Lederman
2011-01-03, 23:13
Ted Dunning
2011-01-03, 23:48
Jon Lederman
2011-01-04, 00:31
Ted Dunning
2011-01-04, 00:41
Jon Lederman
2011-01-04, 00:48
Ted Dunning
2011-01-04, 01:23
Konstantin Boudnik
2011-01-04, 04:47
|
-
Re: Entropy Pool and HDFS FS Commands Hanging SystemJon Lederman 2011-01-03, 23:13
Todd,
I have attached the jstack <pid of namenode> output. Does it appear to be stuck in SecureRandom as you noted as a possibility? I am not sure whether this is indicated in the following output: sh-4.1# jps 4038 JobTracker 4160 Jps 3917 DataNode 4121 TaskTracker 3844 NameNode 3992 SecondaryNameNode sh-4.1# jstack 3844 2011-01-03 15:07:01 Full thread dump OpenJDK Zero VM (14.0-b16 interpreted mode): "Attach Listener" daemon prio=10 tid=0x0021a870 nid=0x106e waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE "3299256@qtp0-1" prio=10 tid=0x6ff2cee8 nid=0x1039 in Object.wait() [0x6f2fe000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x7dcb46a8> (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:565) - locked <0x7dcb46a8> (a org.mortbay.thread.QueuedThreadPool$PoolThread) "15020576@qtp0-0" prio=10 tid=0x6ff2ddd8 nid=0x1038 in Object.wait() [0x6f47e000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x7dcb4718> (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:565) - locked <0x7dcb4718> (a org.mortbay.thread.QueuedThreadPool$PoolThread) "org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor@955cd5" daemon prio=10 tid=0x6ff036f8 nid=0xffe waiting on condition [0x6f68e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor.run(DecommissionManager.java:65) at java.lang.Thread.run(Thread.java:636) "org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor@25c828" daemon prio=10 tid=0x6ff02230 nid=0xff9 waiting on condition [0x6f80e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2327) at java.lang.Thread.run(Thread.java:636) "org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@22ab57" daemon prio=10 tid=0x6ff00e00 nid=0xff8 waiting on condition [0x6f98e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:350) at java.lang.Thread.run(Thread.java:636) "org.apache.hadoop.hdfs.server.namenode.FSNamesystem$HeartbeatMonitor@b1074a" daemon prio=10 tid=0x6ff009b0 nid=0xff7 waiting on condition [0x6fb0e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$HeartbeatMonitor.run(FSNamesystem.java:2309) at java.lang.Thread.run(Thread.java:636) "org.apache.hadoop.hdfs.server.namenode.PendingReplicationBlocks$PendingReplicationMonitor@165f738" daemon prio=10 tid=0x001f66e8 nid=0xff6 waiting on condition [0x6fc9e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.PendingReplicationBlocks$PendingReplicationMonitor.run(PendingReplicationBlocks.java:186) at java.lang.Thread.run(Thread.java:636) "Low Memory Detector" daemon prio=10 tid=0x000c09a8 nid=0xf50 runnable [0x00000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x000bf1b8 nid=0xf4f runnable [0x00000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x000af298 nid=0xf48 in Object.wait() [0x7063e000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x7daf8b40> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133) - locked <0x7daf8b40> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177) "Reference Handler" daemon prio=10 tid=0x000aaa08 nid=0xf47 in Object.wait() [0x707be000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x7daf8bc8> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) - locked <0x7daf8bc8> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x000583c8 nid=0xf3f runnable [0xb729d000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:236) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked <0x70e59ae8> (a java.io.BufferedInputStream) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked <0x70e59970> (a java.io.BufferedInputStream) at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:469) at sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:140) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135) at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131) at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188) - locked <0x70e592c8> (a sun.security.provider.SecureR +
Jon Lederman 2011-01-03, 23:13
-
Re: Entropy Pool and HDFS FS Commands Hanging SystemTed Dunning 2011-01-03, 23:48
Yes. It is stuck as suggested. See the bolded lines.
You can help avoid this by dumping additional entropy into the machine via network traffic. According to the man page for /dev/random you can cheat by writing goo into /dev/urandom, but I have been unable to verify that by experiment. Is it really necessary to use /dev/random here? Again from the man page, there is a strong feeling in the community that only very long lived, high value keys really need to read from /dev/random. Session keys from /dev/urandom are fine. I wrote an adaptation of the secure seed generator that doesn't block for Mahout. It is trivial, but might be useful to copy: http://svn.apache.org/repos/asf/mahout/trunk/math/src/main/java/org/apache/mahout/common/DevURandomSeedGenerator.java On Mon, Jan 3, 2011 at 3:13 PM, Jon Lederman <[EMAIL PROTECTED]> wrote: > I have attached the jstack <pid of namenode> output. Does it appear to be > stuck in SecureRandom as you noted as a possibility? I am not sure whether > this is indicated in the following output: > > ... > "main" prio=10 tid=0x000583c8 nid=0xf3f runnable [0xb729d000] > java.lang.Thread.State: RUNNABLE > * at java.io.FileInputStream.readBytes(Native Method) > * at java.io.FileInputStream.read(FileInputStream.java:236) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > - locked <0x70e59ae8> (a java.io.BufferedInputStream) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > - locked <0x70e59970> (a java.io.BufferedInputStream) > at > sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:469) > at > sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:140) > at > sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135) > * at > sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131) > * at > sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188) > > +
Ted Dunning 2011-01-03, 23:48
-
Re: Entropy Pool and HDFS FS Commands Hanging SystemJon Lederman 2011-01-04, 00:31
Hi Ted,
Could you give me a bit more information on how I can overcome this issue. I am running Hadoop on an embedded processor and networking is turned off to the embedded processor. Is there a quick way to check whether this is in fact blocking on my system? And, are there some variables or configuration options I can set to avoid any potential blocking behavior? Thanks. -Jon On Jan 3, 2011, at 3:48 PM, Ted Dunning wrote: > Yes. It is stuck as suggested. See the bolded lines. > > You can help avoid this by dumping additional entropy into the machine via > network traffic. According to the man page for /dev/random you can cheat by > writing goo into /dev/urandom, but I have been unable to verify that by > experiment. > > Is it really necessary to use /dev/random here? Again from the man page, > there is a strong feeling in the community that only very long lived, high > value keys really need to read from /dev/random. Session keys from > /dev/urandom are fine. > > I wrote an adaptation of the secure seed generator that doesn't block for > Mahout. It is trivial, but might be useful to copy: > http://svn.apache.org/repos/asf/mahout/trunk/math/src/main/java/org/apache/mahout/common/DevURandomSeedGenerator.java > > > > On Mon, Jan 3, 2011 at 3:13 PM, Jon Lederman <[EMAIL PROTECTED]> wrote: > >> I have attached the jstack <pid of namenode> output. Does it appear to be >> stuck in SecureRandom as you noted as a possibility? I am not sure whether >> this is indicated in the following output: >> >> ... >> > "main" prio=10 tid=0x000583c8 nid=0xf3f runnable [0xb729d000] >> java.lang.Thread.State: RUNNABLE >> * at java.io.FileInputStream.readBytes(Native Method) >> * at java.io.FileInputStream.read(FileInputStream.java:236) >> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) >> at java.io.BufferedInputStream.read(BufferedInputStream.java:334) >> - locked <0x70e59ae8> (a java.io.BufferedInputStream) >> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) >> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) >> at java.io.BufferedInputStream.read(BufferedInputStream.java:334) >> - locked <0x70e59970> (a java.io.BufferedInputStream) >> at >> sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:469) >> at >> sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:140) >> at >> sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135) >> * at >> sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131) >> * at >> sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188) >> >> +
Jon Lederman 2011-01-04, 00:31
-
Re: Entropy Pool and HDFS FS Commands Hanging SystemTed Dunning 2011-01-04, 00:41
try
dd if=/dev/random bs=1 count=100 of=/dev/null This will likely hang for a long time. There is no way that I know of to change the behavior of /dev/random except by changing the file itself to point to a different minor device. That would be very bad form. One think you may be able do is to pour lots of entropy into the system via /dev/urandom. I was not able to demonstrate this, though, when I just tried that. It would be nice if there were a config variable to set that would change this behavior, but right now, a code change is required (AFAIK). Another thing to do is replace the use of SecureRandom with a version that uses /dev/urandom. That is the point of the code that I linked to. It provides a plugin replacement that will not block. On Mon, Jan 3, 2011 at 4:31 PM, Jon Lederman <[EMAIL PROTECTED]> wrote: > > Could you give me a bit more information on how I can overcome this issue. > I am running Hadoop on an embedded processor and networking is turned off > to the embedded processor. Is there a quick way to check whether this is in > fact blocking on my system? And, are there some variables or configuration > options I can set to avoid any potential blocking behavior? > > +
Ted Dunning 2011-01-04, 00:41
-
Re: Entropy Pool and HDFS FS Commands Hanging SystemJon Lederman 2011-01-04, 00:48
Thanks. Will try that. One final question, based on the jstack output I sent, is it obvious that the system is blocked due to the behavior of /dev/random? That is, can you enlighten me to the output I sent that explicitly or implicitly indicates the blocking? I am trying to understand whether this is in fact the problem or whether there could be some other issue.
If I just let the FS command run (i.e., hadoop fs -ls), is there any guarantee it will eventually return in some relatively finite period of time such as hours, or could it potentially take days, weeks, years or eternity? Thanks in advance. -Jon On Jan 3, 2011, at 4:41 PM, Ted Dunning wrote: > try > > dd if=/dev/random bs=1 count=100 of=/dev/null > > This will likely hang for a long time. > > There is no way that I know of to change the behavior of /dev/random except > by changing the file itself to point to a different minor device. That > would be very bad form. > > One think you may be able do is to pour lots of entropy into the system via > /dev/urandom. I was not able to demonstrate this, though, when I just tried > that. It would be nice if there were a config variable to set that would > change this behavior, but right now, a code change is required (AFAIK). > > Another thing to do is replace the use of SecureRandom with a version that > uses /dev/urandom. That is the point of the code that I linked to. It > provides a plugin replacement that will not block. > > On Mon, Jan 3, 2011 at 4:31 PM, Jon Lederman <[EMAIL PROTECTED]> wrote: > >> >> Could you give me a bit more information on how I can overcome this issue. >> I am running Hadoop on an embedded processor and networking is turned off >> to the embedded processor. Is there a quick way to check whether this is in >> fact blocking on my system? And, are there some variables or configuration >> options I can set to avoid any potential blocking behavior? >> >> +
Jon Lederman 2011-01-04, 00:48
-
Re: Entropy Pool and HDFS FS Commands Hanging SystemTed Dunning 2011-01-04, 01:23
On Mon, Jan 3, 2011 at 4:48 PM, Jon Lederman <[EMAIL PROTECTED]> wrote:
> Thanks. Will try that. One final question, based on the jstack output I > sent, is it obvious that the system is blocked due to the behavior of > /dev/random? I tried to send you a highlighted markup of your jstack output. The key thing to look for is some thread reading bytes that nests from SecureRandom. > If I just let the FS command run (i.e., hadoop fs -ls), is there any > guarantee it will eventually return in some relatively finite period of time > such as hours, or could it potentially take days, weeks, years or eternity? > > It depends on how quiet your machine is. If it has stuff happening, then it will unwedge eventually. +
Ted Dunning 2011-01-04, 01:23
-
Re: Entropy Pool and HDFS FS Commands Hanging SystemKonstantin Boudnik 2011-01-04, 04:47
Another possibility to fix it is to install rng-tools which will allow
you to increase the amount of entropy in your system. -- Take care, Konstantin (Cos) Boudnik On Mon, Jan 3, 2011 at 16:48, Jon Lederman <[EMAIL PROTECTED]> wrote: > Thanks. Will try that. One final question, based on the jstack output I sent, is it obvious that the system is blocked due to the behavior of /dev/random? That is, can you enlighten me to the output I sent that explicitly or implicitly indicates the blocking? I am trying to understand whether this is in fact the problem or whether there could be some other issue. > > If I just let the FS command run (i.e., hadoop fs -ls), is there any guarantee it will eventually return in some relatively finite period of time such as hours, or could it potentially take days, weeks, years or eternity? > > Thanks in advance. > > -Jon > On Jan 3, 2011, at 4:41 PM, Ted Dunning wrote: > >> try >> >> dd if=/dev/random bs=1 count=100 of=/dev/null >> >> This will likely hang for a long time. >> >> There is no way that I know of to change the behavior of /dev/random except >> by changing the file itself to point to a different minor device. That >> would be very bad form. >> >> One think you may be able do is to pour lots of entropy into the system via >> /dev/urandom. I was not able to demonstrate this, though, when I just tried >> that. It would be nice if there were a config variable to set that would >> change this behavior, but right now, a code change is required (AFAIK). >> >> Another thing to do is replace the use of SecureRandom with a version that >> uses /dev/urandom. That is the point of the code that I linked to. It >> provides a plugin replacement that will not block. >> >> On Mon, Jan 3, 2011 at 4:31 PM, Jon Lederman <[EMAIL PROTECTED]> wrote: >> >>> >>> Could you give me a bit more information on how I can overcome this issue. >>> I am running Hadoop on an embedded processor and networking is turned off >>> to the embedded processor. Is there a quick way to check whether this is in >>> fact blocking on my system? And, are there some variables or configuration >>> options I can set to avoid any potential blocking behavior? >>> >>> > > +
Konstantin Boudnik 2011-01-04, 04:47
|