|
Evert Lammerts
2011-05-11, 09:23
James Seigel
2011-05-11, 12:54
Eric Fiala
2011-05-11, 12:57
Evert Lammerts
2011-05-11, 13:36
James Seigel
2011-05-11, 13:41
James Seigel
2011-05-11, 13:42
Allen Wittenauer
2011-05-11, 16:45
Michel Segel
2011-05-11, 17:22
Evert Lammerts
2011-05-13, 09:49
Evert Lammerts
2011-05-13, 09:54
Segel, Mike
2011-05-13, 12:36
Evert Lammerts
2011-05-13, 21:14
Segel, Mike
2011-05-13, 22:33
Evert Lammerts
2011-05-14, 08:53
Evert Lammerts
2011-05-14, 20:08
|
-
Stability issue - dead DN'sEvert Lammerts 2011-05-11, 09:23
Hi list,
I notice that whenever our Hadoop installation is put under a heavy load we lose one or two (on a total of five) datanodes. This results in IOExceptions, and affects the overall performance of the job being run. Can anybody give me advise or best practices on a different configuration to increase the stability? Below I've included the specs of the cluster, the hadoop related config and an example of when which things go wrong. Any help is very much appreciated, and if I can provide any other info please let me know. Cheers, Evert == What goes wrong, and when = See attached a screenshot of Ganglia when the cluster is under load of a single job. This job: * reads ~1TB from HDFS * writes ~200GB to HDFS * runs 288 Mappers and 35 Reducers When the job runs it takes all available Map and Reduce slots. The system starts swapping and there is a short time interval during which most cores are in WAIT. After that the job really starts running. At around half way, one or two datanodes become unreachable and are marked as dead nodes. The amount of under-replicated blocks becomes huge. Then some "java.io.IOException: Could not obtain block" are thrown in Mappers. The job does manage to finish successfully after around 3.5 hours, but my fear is that when we make the input much larger - which we want - the system becomes too unstable to finish the job. Maybe worth mentioning - never know what might help diagnostics. We notice that memory usage becomes less when we switch our keys from Text to LongWritable. Also, the Mappers are done in a fraction of the time. However, this for some reason results in much more network traffic and makes Reducers extremely slow. We're working on figuring out what causes this. == The cluster = We have a cluster that consists of 6 Sun Thumpers running Hadoop 0.20.2 on CentOS 5.5. One of them acts as NN and JT, the other 5 run DN's and TT's. Each node has: * 16GB RAM * 32GB swapspace * 4 cores * 11 LVM's of 4 x 500GB disks (2TB in total) for HDFS * non-HDFS stuff on separate disks * a 2x1GE bonded network interface for interconnects * a 2x1GE bonded network interface for external access I realize that this is not a well balanced system, but it's what we had available for a prototype environment. We're working on putting together a specification for a much larger production environment. == Hadoop config = Here some properties that I think might be relevant: __CORE-SITE.XML__ fs.inmemory.size.mb: 200 mapreduce.task.io.sort.factor: 100 mapreduce.task.io.sort.mb: 200 # 1024*1024*4 MB, blocksize of the LVM's io.file.buffer.size: 4194304 __HDFS-SITE.XML__ # 1024*1024*4*32 MB, 32 times the blocksize of the LVM's dfs.block.size: 134217728 # Only 5 DN's, but this shouldn't hurt dfs.namenode.handler.count: 40 # This got rid of the occasional "Could not obtain block"'s dfs.datanode.max.xcievers: 4096 __MAPRED-SITE.XML__ mapred.tasktracker.map.tasks.maximum: 4 mapred.tasktracker.reduce.tasks.maximum: 4 mapred.child.java.opts: -Xmx2560m mapreduce.reduce.shuffle.parallelcopies: 20 mapreduce.map.java.opts: -Xmx512m mapreduce.reduce.java.opts: -Xmx512m # Compression codecs are configured and seem to work fine mapred.compress.map.output: true mapred.map.output.compression.codec: com.hadoop.compression.lzo.LzoCodec
-
Re: Stability issue - dead DN'sJames Seigel 2011-05-11, 12:54
Evert,
What’s the stack trace and what version of hadoop do you have installed Sir! James. On 2011-05-11, at 3:23 AM, Evert Lammerts wrote: > Hi list, > > I notice that whenever our Hadoop installation is put under a heavy load we lose one or two (on a total of five) datanodes. This results in IOExceptions, and affects the overall performance of the job being run. Can anybody give me advise or best practices on a different configuration to increase the stability? Below I've included the specs of the cluster, the hadoop related config and an example of when which things go wrong. Any help is very much appreciated, and if I can provide any other info please let me know. > > Cheers, > Evert > > == What goes wrong, and when => > See attached a screenshot of Ganglia when the cluster is under load of a single job. This job: > * reads ~1TB from HDFS > * writes ~200GB to HDFS > * runs 288 Mappers and 35 Reducers > > When the job runs it takes all available Map and Reduce slots. The system starts swapping and there is a short time interval during which most cores are in WAIT. After that the job really starts running. At around half way, one or two datanodes become unreachable and are marked as dead nodes. The amount of under-replicated blocks becomes huge. Then some "java.io.IOException: Could not obtain block" are thrown in Mappers. The job does manage to finish successfully after around 3.5 hours, but my fear is that when we make the input much larger - which we want - the system becomes too unstable to finish the job. > > Maybe worth mentioning - never know what might help diagnostics. We notice that memory usage becomes less when we switch our keys from Text to LongWritable. Also, the Mappers are done in a fraction of the time. However, this for some reason results in much more network traffic and makes Reducers extremely slow. We're working on figuring out what causes this. > > > == The cluster => > We have a cluster that consists of 6 Sun Thumpers running Hadoop 0.20.2 on CentOS 5.5. One of them acts as NN and JT, the other 5 run DN's and TT's. Each node has: > * 16GB RAM > * 32GB swapspace > * 4 cores > * 11 LVM's of 4 x 500GB disks (2TB in total) for HDFS > * non-HDFS stuff on separate disks > * a 2x1GE bonded network interface for interconnects > * a 2x1GE bonded network interface for external access > > I realize that this is not a well balanced system, but it's what we had available for a prototype environment. We're working on putting together a specification for a much larger production environment. > > > == Hadoop config => > Here some properties that I think might be relevant: > > __CORE-SITE.XML__ > > fs.inmemory.size.mb: 200 > mapreduce.task.io.sort.factor: 100 > mapreduce.task.io.sort.mb: 200 > # 1024*1024*4 MB, blocksize of the LVM's > io.file.buffer.size: 4194304 > > __HDFS-SITE.XML__ > > # 1024*1024*4*32 MB, 32 times the blocksize of the LVM's > dfs.block.size: 134217728 > # Only 5 DN's, but this shouldn't hurt > dfs.namenode.handler.count: 40 > # This got rid of the occasional "Could not obtain block"'s > dfs.datanode.max.xcievers: 4096 > > __MAPRED-SITE.XML__ > > mapred.tasktracker.map.tasks.maximum: 4 > mapred.tasktracker.reduce.tasks.maximum: 4 > mapred.child.java.opts: -Xmx2560m > mapreduce.reduce.shuffle.parallelcopies: 20 > mapreduce.map.java.opts: -Xmx512m > mapreduce.reduce.java.opts: -Xmx512m > # Compression codecs are configured and seem to work fine > mapred.compress.map.output: true > mapred.map.output.compression.codec: com.hadoop.compression.lzo.LzoCodec >
-
Re: Stability issue - dead DN'sEric Fiala 2011-05-11, 12:57
Evert,
>From my experience, hadoop loathes swap and you mention that all reduces and mappers are running (8 total) and from the ganglia screenshot I see that you have a thick crest of that purple swap. If we do the math that means [ map.tasks.max * mapred.child.java.opts ] + [ reduce.tasks.max * mapred.child.java.opts ] => or [ 4 * 2.5G ] + [ 4 * 2.5G ] is greater than the amount of physical RAM in the machine. This doesn't account for the base tasktracker and datanode process + OS overhead and whatever else may be hoarding resources on the systems. I would play with this ratio, either less maps / reduces max - or lower your child.java.opts so that when you are fully subscribed you are not using more resource than the machine can offer. Also, setting mapred.reduce.slowstart.completed.maps to 1.00 or some other value close to 1 would be one way to guarantee only 4 either maps or reduces to be running at once and address (albeit in a duct tape like way) the oversubscription problem you are seeing (this represents the fractions of maps that should complete before initiating the reduce phase). hth EF On Wed, May 11, 2011 at 3:23 AM, Evert Lammerts <[EMAIL PROTECTED]>wrote: > Hi list, > > I notice that whenever our Hadoop installation is put under a heavy load we > lose one or two (on a total of five) datanodes. This results in > IOExceptions, and affects the overall performance of the job being run. Can > anybody give me advise or best practices on a different configuration to > increase the stability? Below I've included the specs of the cluster, the > hadoop related config and an example of when which things go wrong. Any help > is very much appreciated, and if I can provide any other info please let me > know. > > Cheers, > Evert > > == What goes wrong, and when => > See attached a screenshot of Ganglia when the cluster is under load of a > single job. This job: > * reads ~1TB from HDFS > * writes ~200GB to HDFS > * runs 288 Mappers and 35 Reducers > > When the job runs it takes all available Map and Reduce slots. The system > starts swapping and there is a short time interval during which most cores > are in WAIT. After that the job really starts running. At around half way, > one or two datanodes become unreachable and are marked as dead nodes. The > amount of under-replicated blocks becomes huge. Then some > "java.io.IOException: Could not obtain block" are thrown in Mappers. The job > does manage to finish successfully after around 3.5 hours, but my fear is > that when we make the input much larger - which we want - the system becomes > too unstable to finish the job. > > Maybe worth mentioning - never know what might help diagnostics. We notice > that memory usage becomes less when we switch our keys from Text to > LongWritable. Also, the Mappers are done in a fraction of the time. However, > this for some reason results in much more network traffic and makes Reducers > extremely slow. We're working on figuring out what causes this. > > > == The cluster => > We have a cluster that consists of 6 Sun Thumpers running Hadoop 0.20.2 on > CentOS 5.5. One of them acts as NN and JT, the other 5 run DN's and TT's. > Each node has: > * 16GB RAM > * 32GB swapspace > * 4 cores > * 11 LVM's of 4 x 500GB disks (2TB in total) for HDFS > * non-HDFS stuff on separate disks > * a 2x1GE bonded network interface for interconnects > * a 2x1GE bonded network interface for external access > > I realize that this is not a well balanced system, but it's what we had > available for a prototype environment. We're working on putting together a > specification for a much larger production environment. > > > == Hadoop config => > Here some properties that I think might be relevant: > > __CORE-SITE.XML__ > > fs.inmemory.size.mb: 200 > mapreduce.task.io.sort.factor: 100 > mapreduce.task.io.sort.mb: 200 > # 1024*1024*4 MB, blocksize of the LVM's > io.file.buffer.size: 4194304 > > __HDFS-SITE.XML__ > > # 1024*1024*4*32 MB, 32 times the blocksize of the LVM's
-
RE: Stability issue - dead DN'sEvert Lammerts 2011-05-11, 13:36
Hi James,
Hadoop version is 0.20.2 (find that and more on our setup also in my first mail, under heading "The cluster"). Below I) an example stacktrace of losing a datanode is and II) an example of a "Could not obtain block" IOException. Cheers, Evert 11/05/11 15:06:43 INFO hdfs.DFSClient: Failed to connect to /192.168.28.214:50050, add to deadNodes and continue java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.28.209:50726 remote=/192.168.28.214:50050] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readShort(DataInputStream.java:295) at org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1478) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1811) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948) at java.io.DataInputStream.readFully(DataInputStream.java:178) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1945) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1845) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1891) at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:95) at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:1) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) at nl.liacs.infrawatch.hadoop.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:85) at nl.liacs.infrawatch.hadoop.kmeans.Job.run(Job.java:171) at nl.liacs.infrawatch.hadoop.kmeans.Job.main(Job.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) 11/05/10 09:43:47 INFO mapred.JobClient: map 82% reduce 17% 11/05/10 09:44:39 INFO mapred.JobClient: Task Id : attempt_201104121440_0122_m_000225_0, Status : FAILED java.io.IOException: Could not obtain block: blk_4397122445076815438_4097927 file=/user/joaquin/data/20081201/20081201.039 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1993) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1800) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:97) at nl.liacs.infrawatch.hadoop.kmeans.KeyValueLineRecordReader.nextKeyValue(KeyValueLineRecordReader.java:94) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262)
-
Re: Stability issue - dead DN'sJames Seigel 2011-05-11, 13:41
I noticed you crossposted to cloudera, what version of theirs are you running?
Cheers James Sent from my mobile. Please excuse the typos. On 2011-05-11, at 7:39 AM, Evert Lammerts <[EMAIL PROTECTED]> wrote: > Hi James, > > Hadoop version is 0.20.2 (find that and more on our setup also in my first mail, under heading "The cluster"). > > Below I) an example stacktrace of losing a datanode is and II) an example of a "Could not obtain block" IOException. > > Cheers, > Evert > > 11/05/11 15:06:43 INFO hdfs.DFSClient: Failed to connect to /192.168.28.214:50050, add to deadNodes and continue > java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/192.168.28.209:50726 remote=/192.168.28.214:50050] > at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at java.io.DataInputStream.readShort(DataInputStream.java:295) > at org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1478) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1811) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) > at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1945) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1845) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1891) > at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:95) > at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:1) > at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) > at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) > at nl.liacs.infrawatch.hadoop.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:85) > at nl.liacs.infrawatch.hadoop.kmeans.Job.run(Job.java:171) > at nl.liacs.infrawatch.hadoop.kmeans.Job.main(Job.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > > > 11/05/10 09:43:47 INFO mapred.JobClient: map 82% reduce 17% 11/05/10 09:44:39 INFO mapred.JobClient: Task Id : > attempt_201104121440_0122_m_000225_0, Status : FAILED > java.io.IOException: Could not obtain block: > blk_4397122445076815438_4097927 > file=/user/joaquin/data/20081201/20081201.039 > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1993) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1800) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948) > at java.io.DataInputStream.read(DataInputStream.java:83) > at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) > at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:97) > at nl.liacs.infrawatch.hadoop.kmeans.KeyValueLineRecordReader.nextKeyValue(KeyValueLineRecordReader.java:94)
-
Re: Stability issue - dead DN'sJames Seigel 2011-05-11, 13:42
Have you dug into the dead DN logs as well?
James Sent from my mobile. Please excuse the typos. On 2011-05-11, at 7:39 AM, Evert Lammerts <[EMAIL PROTECTED]> wrote: > Hi James, > > Hadoop version is 0.20.2 (find that and more on our setup also in my first mail, under heading "The cluster"). > > Below I) an example stacktrace of losing a datanode is and II) an example of a "Could not obtain block" IOException. > > Cheers, > Evert > > 11/05/11 15:06:43 INFO hdfs.DFSClient: Failed to connect to /192.168.28.214:50050, add to deadNodes and continue > java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/192.168.28.209:50726 remote=/192.168.28.214:50050] > at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at java.io.DataInputStream.readShort(DataInputStream.java:295) > at org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1478) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1811) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) > at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1945) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1845) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1891) > at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:95) > at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:1) > at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) > at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) > at nl.liacs.infrawatch.hadoop.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:85) > at nl.liacs.infrawatch.hadoop.kmeans.Job.run(Job.java:171) > at nl.liacs.infrawatch.hadoop.kmeans.Job.main(Job.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > > > 11/05/10 09:43:47 INFO mapred.JobClient: map 82% reduce 17% 11/05/10 09:44:39 INFO mapred.JobClient: Task Id : > attempt_201104121440_0122_m_000225_0, Status : FAILED > java.io.IOException: Could not obtain block: > blk_4397122445076815438_4097927 > file=/user/joaquin/data/20081201/20081201.039 > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1993) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1800) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948) > at java.io.DataInputStream.read(DataInputStream.java:83) > at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) > at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:97) > at nl.liacs.infrawatch.hadoop.kmeans.KeyValueLineRecordReader.nextKeyValue(KeyValueLineRecordReader.java:94)
-
Re: Stability issue - dead DN'sAllen Wittenauer 2011-05-11, 16:45
On May 11, 2011, at 5:57 AM, Eric Fiala wrote: > > If we do the math that means [ map.tasks.max * mapred.child.java.opts ] + > [ reduce.tasks.max * mapred.child.java.opts ] => or [ 4 * 2.5G ] + [ 4 * > 2.5G ] is greater than the amount of physical RAM in the machine. > This doesn't account for the base tasktracker and datanode process + OS > overhead and whatever else may be hoarding resources on the systems. +1 to what Eric said. You've exhausted memory and now the whole system is falling apart. > I would play with this ratio, either less maps / reduces max - or lower your > child.java.opts so that when you are fully subscribed you are not using > more resource than the machine can offer. Yup. > Also, setting mapred.reduce.slowstart.completed.maps to 1.00 or some other > value close to 1 would be one way to guarantee only 4 either maps or reduces > to be running at once and address (albeit in a duct tape like way) the > oversubscription problem you are seeing (this represents the fractions of > maps that should complete before initiating the reduce phase). slowstart isn't really going to help you much here. All it takes is another job with the same settings running at the same time and processes will start dying again. That said, the default for slowstart is incredibly stupid for the vast majority. Something closer to .70 or .80 is more realistic. >> * a 2x1GE bonded network interface for interconnects >> * a 2x1GE bonded network interface for external access Multiple NICs on a box can sometimes cause big performance problems with Hadoop. So watch your traffic carefully.
-
Re: Stability issue - dead DN'sMichel Segel 2011-05-11, 17:22
You really really don't want to do this.
Long story short... It won't work. Just a suggestion.. You don't want anyone on your cluster itself. They should interact wit edge nodes, which are 'Hadoop aware'. Then your cluster has a single network to worry about. Sent from a remote device. Please excuse any typos... Mike Segel On May 11, 2011, at 11:45 AM, Allen Wittenauer <[EMAIL PROTECTED]> wrote: > > > > >>> * a 2x1GE bonded network interface for interconnects >>> * a 2x1GE bonded network interface for external access > > Multiple NICs on a box can sometimes cause big performance problems with Hadoop. So watch your traffic carefully. > > >
-
RE: Stability issue - dead DN'sEvert Lammerts 2011-05-13, 09:49
> From my experience, hadoop loathes swap and you mention that all reduces and mappers are running (8 total) and from the ganglia screenshot I see that you have a thick crest of that purple swap.
I know, it's ugly isn't it :) My understanding is that this is partly due to forked processes though. > If we do the math that means [ map.tasks.max * mapred.child.java.opts ] + [ reduce.tasks.max * mapred.child.java.opts ] => or [ 4 * 2.5G ] + [ 4 * 2.5G ] is greater than the amount of physical RAM in the machine. > This doesn't account for the base tasktracker and datanode process + OS overhead and whatever else may be hoarding resources on the systems. This makes me feel stupid :) Your right, I've just screwed it down, we'll see how it performs now. > I would play with this ratio, either less maps / reduces max - or lower your child.java.opts so that when you are fully subscribed you are not using more resource than the machine can offer. > Also, setting mapred.reduce.slowstart.completed.maps to 1.00 or some other value close to 1 would be one way to guarantee only 4 either maps or reduces to be running at once and address (albeit in a duct tape like way) the oversubscription problem you are seeing (this represents the fractions of maps that should complete before initiating the reduce phase). This is a new one for me. I get Allen's point that on a multi tenant cluster this won't fix the problem, but the default is definitely not a good one. Starting reduce tasks as soon as map tasks start running is hardly ever useful, and just takes up slots that could be used by others. Thanks a bunch for the suggestions! Cheers, Evert On Wed, May 11, 2011 at 3:23 AM, Evert Lammerts <[EMAIL PROTECTED]> wrote: Hi list, I notice that whenever our Hadoop installation is put under a heavy load we lose one or two (on a total of five) datanodes. This results in IOExceptions, and affects the overall performance of the job being run. Can anybody give me advise or best practices on a different configuration to increase the stability? Below I've included the specs of the cluster, the hadoop related config and an example of when which things go wrong. Any help is very much appreciated, and if I can provide any other info please let me know. Cheers, Evert == What goes wrong, and when = See attached a screenshot of Ganglia when the cluster is under load of a single job. This job: * reads ~1TB from HDFS * writes ~200GB to HDFS * runs 288 Mappers and 35 Reducers When the job runs it takes all available Map and Reduce slots. The system starts swapping and there is a short time interval during which most cores are in WAIT. After that the job really starts running. At around half way, one or two datanodes become unreachable and are marked as dead nodes. The amount of under-replicated blocks becomes huge. Then some "java.io.IOException: Could not obtain block" are thrown in Mappers. The job does manage to finish successfully after around 3.5 hours, but my fear is that when we make the input much larger - which we want - the system becomes too unstable to finish the job. Maybe worth mentioning - never know what might help diagnostics. We notice that memory usage becomes less when we switch our keys from Text to LongWritable. Also, the Mappers are done in a fraction of the time. However, this for some reason results in much more network traffic and makes Reducers extremely slow. We're working on figuring out what causes this. == The cluster = We have a cluster that consists of 6 Sun Thumpers running Hadoop 0.20.2 on CentOS 5.5. One of them acts as NN and JT, the other 5 run DN's and TT's. Each node has: * 16GB RAM * 32GB swapspace * 4 cores * 11 LVM's of 4 x 500GB disks (2TB in total) for HDFS * non-HDFS stuff on separate disks * a 2x1GE bonded network interface for interconnects * a 2x1GE bonded network interface for external access I realize that this is not a well balanced system, but it's what we had available for a prototype environment. We're working on putting together a specification for a much larger production environment. == Hadoop config = Here some properties that I think might be relevant: __CORE-SITE.XML__ fs.inmemory.size.mb: 200 mapreduce.task.io.sort.factor: 100 mapreduce.task.io.sort.mb: 200 # 1024*1024*4 MB, blocksize of the LVM's io.file.buffer.size: 4194304 __HDFS-SITE.XML__ # 1024*1024*4*32 MB, 32 times the blocksize of the LVM's dfs.block.size: 134217728 # Only 5 DN's, but this shouldn't hurt dfs.namenode.handler.count: 40 # This got rid of the occasional "Could not obtain block"'s dfs.datanode.max.xcievers: 4096 __MAPRED-SITE.XML__ mapred.tasktracker.map.tasks.maximum: 4 mapred.tasktracker.reduce.tasks.maximum: 4 mapred.child.java.opts: -Xmx2560m mapreduce.reduce.shuffle.parallelcopies: 20 mapreduce.map.java.opts: -Xmx512m mapreduce.reduce.java.opts: -Xmx512m # Compression codecs are configured and seem to work fine mapred.compress.map.output: true mapred.map.output.compression.codec: com.hadoop.compression.lzo.LzoCodec
-
RE: Stability issue - dead DN'sEvert Lammerts 2011-05-13, 09:54
Hi Mike,
> You really really don't want to do this. > Long story short... It won't work. Can you elaborate? Are you talking about the bonded interfaces or about having a separated network for interconnects and external network? What can go wrong there? > > Just a suggestion.. You don't want anyone on your cluster itself. They > should interact wit edge nodes, which are 'Hadoop aware'. Then your > cluster has a single network to worry about. That's our current setup. We have a single headnode that is used as a SPOE. However, I'd like to change that on our future production system. We want to implement Kerberos for authentication, and let users interact with the cluster from their own machine. This would enable them to submit their jobs from the local IDE. The only way to do this is by opening up Hadoop ports for the world, is my understanding: if people interact with HDFS they need to be able to interact with all nodes, right? What would be the argument against this? Cheers, Evert > > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On May 11, 2011, at 11:45 AM, Allen Wittenauer <[EMAIL PROTECTED]> wrote: > > > > > > > > > > >>> * a 2x1GE bonded network interface for interconnects > >>> * a 2x1GE bonded network interface for external access > > > > Multiple NICs on a box can sometimes cause big performance > problems with Hadoop. So watch your traffic carefully. > > > > > >
-
Re: Stability issue - dead DN'sSegel, Mike 2011-05-13, 12:36
Bonded will work but you may not see the performance you would expect. If you need >1 GBe, go 10GBe less headache and has even more headroom.
Multiple interfaces won't work. Or I should say didn't work in past releases. If you think about it, clients have to connect to each node. So having two interfaces and trying to manage them makes no sense. Add to this trying to manage this in DNS ... Why make more work for yourself? Going from memory... It looked like you rDNS had to match you hostnames so your internal interfaces had to match hostnames so you had an inverted network. If you draw out your network topology you end up with a ladder. You would be better off (IMHO) to create a subnet where only your edge servers are dual nic'd. But then if your cluster is for development... Now your PCs can't be used as clients... Does this make sense? Sent from a remote device. Please excuse any typos... Mike Segel On May 13, 2011, at 4:57 AM, "Evert Lammerts" <[EMAIL PROTECTED]> wrote: > Hi Mike, > >> You really really don't want to do this. >> Long story short... It won't work. > > Can you elaborate? Are you talking about the bonded interfaces or about having a separated network for interconnects and external network? What can go wrong there? > >> >> Just a suggestion.. You don't want anyone on your cluster itself. They >> should interact wit edge nodes, which are 'Hadoop aware'. Then your >> cluster has a single network to worry about. > > That's our current setup. We have a single headnode that is used as a SPOE. However, I'd like to change that on our future production system. We want to implement Kerberos for authentication, and let users interact with the cluster from their own machine. This would enable them to submit their jobs from the local IDE. The only way to do this is by opening up Hadoop ports for the world, is my understanding: if people interact with HDFS they need to be able to interact with all nodes, right? What would be the argument against this? > > Cheers, > Evert > >> >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On May 11, 2011, at 11:45 AM, Allen Wittenauer <[EMAIL PROTECTED]> wrote: >> >>> >>> >>> >>> >>>>> * a 2x1GE bonded network interface for interconnects >>>>> * a 2x1GE bonded network interface for external access >>> >>> Multiple NICs on a box can sometimes cause big performance >> problems with Hadoop. So watch your traffic carefully. >>> >>> >>> The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.
-
RE: Stability issue - dead DN'sEvert Lammerts 2011-05-13, 21:14
Hi Mike,
Thanks for trying to help out. I had a talk with our networking guys this afternoon. According to them (and this is way out of my area of expertise, so excuse any mistakes) multiple interfaces shouldn't be a problem. We could set up a nameserver to resolve hostnames to addresses in our private space when the request comes from one of the nodes, and route this traffic over a single interface. Any other request can be resolved to an address in the public space, which is bound to an other interface. In our current setup we're not even resolving hostnames in our private address space through a nameserver - we do it with an ugly hack in /etc/hosts. And it seems to work alright. Having said that, our problems are still not completely gone even after adjusting the maximum allowed RAM for tasks - although things are lots better. While writing this mail three out of five DN's were marked as dead. There still is some swapping going on, but the cores are not spending any time in WAIT, so this shouldn't be the cause of anything. See below a trace from a dead DN - any thoughts are appreciated! Cheers, Evert 2011-05-13 23:13:27,716 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-9131821326787012529_2915672 src: /192.168.28.211:60136 dest: /192.168.28.214:50050 of size 382425 2011-05-13 23:13:27,915 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9132067116195286882_130888 java.io.EOFException: while trying to read 3744913 bytes 2011-05-13 23:13:27,925 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.214:50050, dest: /192.168.28.214:35139, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201105131125_0025_m_001437_0, offset: 196608, srvID: DS-443352839-145.100.2.183-50050-1291128673616, blockid: blk_-9163184839986480695_4112368, duration: 6254000 2011-05-13 23:13:28,032 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-9149862728087355005_3793421 src: /192.168.28.210:41197 dest: /192.168.28.214:50050 of size 245767 2011-05-13 23:13:28,033 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9132067116195286882_130888 unfinalized and removed. 2011-05-13 23:13:28,033 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9132067116195286882_130888 received exception java.io.EOFException: while trying to read 3744913 bytes 2011-05-13 23:13:28,033 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.214:50050, storageID=DS-443352839-145.100.2.183-50050-1291128673616, infoPort=50075, ipcPort=50020):DataXceiver java.io.EOFException: while trying to read 3744913 bytes at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122) 2011-05-13 23:13:28,038 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.214:50050, dest: /192.168.28.214:32910, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201105131125_0025_m_001443_0, offset: 197632, srvID: DS-443352839-145.100.2.183-50050-1291128673616, blockid: blk_-9163184839986480695_4112368, duration: 4323000 2011-05-13 23:13:28,038 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.214:50050, dest: /192.168.28.214:35138, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201105131125_0025_m_001440_0, offset: 197120, srvID: DS-443352839-145.100.2.183-50050-1291128673616, blockid: blk_-9163184839986480695_4112368, duration: 5573000 2011-05-13 23:13:28,159 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.214:50050, dest: /192.168.28.212:38574, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201105131125_0025_m_001444_0, offset: 197632, srvID: DS-443352839-145.100.2.183-50050-1291128673616, blockid: blk_-9163184839986480695_4112368, duration: 16939000 2011-05-13 23:13:28,209 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-9123390874940601805_2898225 src: /192.168.28.210:44227 dest: /192.168.28.214:50050 of size 300441 2011-05-13 23:13:28,217 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.214:50050, dest: /192.168.28.213:42364, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201105131125_0025_m_001451_0, offset: 198656, srvID: DS-443352839-145.100.2.183-50050-1291128673616, blockid: blk_-9163184839986480695_4112368, duration: 5291000 2011-05-13 23:13:28,252 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.214:50050, dest: /192.168.28.214:32930, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201105131125_0025_m_001436_0, offset: 0, srvID: DS-443352839-145.100.2.183-50050-1291128673616, blockid: blk_-1800696633107072247_4099834, duration: 5099000 2011-05-13 23:13:28,256 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.214:50050, dest: /192.168.28.213:42363, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201105131125_0025_m_001458_0, offset: 199680, srvID: DS-443352839-145.100.2.183-50050-1291128673616, blockid: blk_-9163184839986480695_4112368, duration: 4945000 2011-05-13 23:13:28,257 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.214:50050, dest: /192.168.28.214:35137, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201105131125_0025_m_001436_0, offset: 196608, srvID: DS-443352839-145.100.2.183-50050-1291128673616, blockid: blk_-9163184839986480695_4112368, duration: 4159000 2011-05-1
-
Re: Stability issue - dead DN'sSegel, Mike 2011-05-13, 22:33
Ok...
Hum, look, I've been force fed a couple of margaritas so, my memory is a bit foggy... You say your clients connect on nic A. Your cluster connects on nic B. What happens when you want to upload a file from your client to HDFS? Or even access it? ... ;-) Sent from a remote device. Please excuse any typos... Mike Segel On May 13, 2011, at 4:15 PM, "Evert Lammerts" <[EMAIL PROTECTED]> wrote: > Hi Mike, > > Thanks for trying to help out. > > I had a talk with our networking guys this afternoon. According to them (and this is way out of my area of expertise, so excuse any mistakes) multiple interfaces shouldn't be a problem. We could set up a nameserver to resolve hostnames to addresses in our private space when the request comes from one of the nodes, and route this traffic over a single interface. Any other request can be resolved to an address in the public space, which is bound to an other interface. In our current setup we're not even resolving hostnames in our private address space through a nameserver - we do it with an ugly hack in /etc/hosts. And it seems to work alright. > > Having said that, our problems are still not completely gone even after adjusting the maximum allowed RAM for tasks - although things are lots better. While writing this mail three out of five DN's were marked as dead. There still is some swapping going on, but the cores are not spending any time in WAIT, so this shouldn't be the cause of anything. See below a trace from a dead DN - any thoughts are appreciated! > > Cheers, > Evert > > 2011-05-13 23:13:27,716 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-9131821326787012529_2915672 src: /192.168.28.211:60136 dest: /192.168.28.214:50050 of size 382425 > 2011-05-13 23:13:27,915 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9132067116195286882_130888 java.io.EOFException: while trying to read 3744913 bytes > 2011-05-13 23:13:27,925 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.214:50050, dest: /192.168.28.214:35139, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201105131125_0025_m_001437_0, offset: 196608, srvID: DS-443352839-145.100.2.183-50050-1291128673616, blockid: blk_-9163184839986480695_4112368, duration: 6254000 > 2011-05-13 23:13:28,032 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-9149862728087355005_3793421 src: /192.168.28.210:41197 dest: /192.168.28.214:50050 of size 245767 > 2011-05-13 23:13:28,033 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9132067116195286882_130888 unfinalized and removed. > 2011-05-13 23:13:28,033 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9132067116195286882_130888 received exception java.io.EOFException: while trying to read 3744913 bytes > 2011-05-13 23:13:28,033 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.214:50050, storageID=DS-443352839-145.100.2.183-50050-1291128673616, infoPort=50075, ipcPort=50020):DataXceiver > java.io.EOFException: while trying to read 3744913 bytes > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122) > 2011-05-13 23:13:28,038 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.214:50050, dest: /192.168.28.214:32910, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201105131125_0025_m_001443_0, offset: 197632, srvID: DS-443352839-145.100.2.183-50050-1291128673616, blockid: blk_-9163184839986480695_4112368, duration: 4323000 The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.
-
RE: Stability issue - dead DN'sEvert Lammerts 2011-05-14, 08:53
Ok, I'll give this scenario a try (in spite of the intoxication ;-)).
= putting or getting a file A client will access the NameNode first and get a list of hostnames. These will resolve to addresses either in public or in private space, depending on whether the request to the nameserver was made by a machine in public or in private space. Each node has one NIC listening on its address in private space and one on its address in public space. The Hadoop daemons are bound to 0.0.0.0:*. Reverse DNS will return an address in private space when the client connects from one of the nodes, and (obviously) an address in public space when the request came through WAN. I'm not sure what could go wrong here... On Monday I'll recheck this scenario with our HPN guys as well. Cheers, Evert ________________________________________ From: Segel, Mike [[EMAIL PROTECTED]] Sent: Saturday, May 14, 2011 12:33 AM To: [EMAIL PROTECTED] Subject: Re: Stability issue - dead DN's Ok... Hum, look, I've been force fed a couple of margaritas so, my memory is a bit foggy... You say your clients connect on nic A. Your cluster connects on nic B. What happens when you want to upload a file from your client to HDFS? Or even access it? ... ;-) Sent from a remote device. Please excuse any typos... Mike Segel On May 13, 2011, at 4:15 PM, "Evert Lammerts" <[EMAIL PROTECTED]> wrote: > Hi Mike, > > Thanks for trying to help out. > > I had a talk with our networking guys this afternoon. According to them (and this is way out of my area of expertise, so excuse any mistakes) multiple interfaces shouldn't be a problem. We could set up a nameserver to resolve hostnames to addresses in our private space when the request comes from one of the nodes, and route this traffic over a single interface. Any other request can be resolved to an address in the public space, which is bound to an other interface. In our current setup we're not even resolving hostnames in our private address space through a nameserver - we do it with an ugly hack in /etc/hosts. And it seems to work alright. > > Having said that, our problems are still not completely gone even after adjusting the maximum allowed RAM for tasks - although things are lots better. While writing this mail three out of five DN's were marked as dead. There still is some swapping going on, but the cores are not spending any time in WAIT, so this shouldn't be the cause of anything. See below a trace from a dead DN - any thoughts are appreciated! > > Cheers, > Evert > > 2011-05-13 23:13:27,716 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-9131821326787012529_2915672 src: /192.168.28.211:60136 dest: /192.168.28.214:50050 of size 382425 > 2011-05-13 23:13:27,915 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9132067116195286882_130888 java.io.EOFException: while trying to read 3744913 bytes > 2011-05-13 23:13:27,925 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.214:50050, dest: /192.168.28.214:35139, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201105131125_0025_m_001437_0, offset: 196608, srvID: DS-443352839-145.100.2.183-50050-1291128673616, blockid: blk_-9163184839986480695_4112368, duration: 6254000 > 2011-05-13 23:13:28,032 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-9149862728087355005_3793421 src: /192.168.28.210:41197 dest: /192.168.28.214:50050 of size 245767 > 2011-05-13 23:13:28,033 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9132067116195286882_130888 unfinalized and removed. > 2011-05-13 23:13:28,033 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9132067116195286882_130888 received exception java.io.EOFException: while trying to read 3744913 bytes > 2011-05-13 23:13:28,033 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.214:50050, storageID=DS-443352839-145.100.2.183-50050-1291128673616, infoPort=50075, ipcPort=50020):DataXceiver The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.
-
RE: Stability issue - dead DN'sEvert Lammerts 2011-05-14, 20:08
Just to check: the NN gives back hostnames of DN's to the client when getting or putting data, and not IP addresses right?
Cheers, Evert ________________________________________ From: Evert Lammerts [[EMAIL PROTECTED]] Sent: Saturday, May 14, 2011 10:53 AM To: [EMAIL PROTECTED] Subject: RE: Stability issue - dead DN's Ok, I'll give this scenario a try (in spite of the intoxication ;-)). = putting or getting a file A client will access the NameNode first and get a list of hostnames. These will resolve to addresses either in public or in private space, depending on whether the request to the nameserver was made by a machine in public or in private space. Each node has one NIC listening on its address in private space and one on its address in public space. The Hadoop daemons are bound to 0.0.0.0:*. Reverse DNS will return an address in private space when the client connects from one of the nodes, and (obviously) an address in public space when the request came through WAN. I'm not sure what could go wrong here... On Monday I'll recheck this scenario with our HPN guys as well. Cheers, Evert ________________________________________ From: Segel, Mike [[EMAIL PROTECTED]] Sent: Saturday, May 14, 2011 12:33 AM To: [EMAIL PROTECTED] Subject: Re: Stability issue - dead DN's Ok... Hum, look, I've been force fed a couple of margaritas so, my memory is a bit foggy... You say your clients connect on nic A. Your cluster connects on nic B. What happens when you want to upload a file from your client to HDFS? Or even access it? ... ;-) Sent from a remote device. Please excuse any typos... Mike Segel On May 13, 2011, at 4:15 PM, "Evert Lammerts" <[EMAIL PROTECTED]> wrote: > Hi Mike, > > Thanks for trying to help out. > > I had a talk with our networking guys this afternoon. According to them (and this is way out of my area of expertise, so excuse any mistakes) multiple interfaces shouldn't be a problem. We could set up a nameserver to resolve hostnames to addresses in our private space when the request comes from one of the nodes, and route this traffic over a single interface. Any other request can be resolved to an address in the public space, which is bound to an other interface. In our current setup we're not even resolving hostnames in our private address space through a nameserver - we do it with an ugly hack in /etc/hosts. And it seems to work alright. > > Having said that, our problems are still not completely gone even after adjusting the maximum allowed RAM for tasks - although things are lots better. While writing this mail three out of five DN's were marked as dead. There still is some swapping going on, but the cores are not spending any time in WAIT, so this shouldn't be the cause of anything. See below a trace from a dead DN - any thoughts are appreciated! > > Cheers, > Evert > > 2011-05-13 23:13:27,716 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-9131821326787012529_2915672 src: /192.168.28.211:60136 dest: /192.168.28.214:50050 of size 382425 > 2011-05-13 23:13:27,915 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9132067116195286882_130888 java.io.EOFException: while trying to read 3744913 bytes > 2011-05-13 23:13:27,925 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.214:50050, dest: /192.168.28.214:35139, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201105131125_0025_m_001437_0, offset: 196608, srvID: DS-443352839-145.100.2.183-50050-1291128673616, blockid: blk_-9163184839986480695_4112368, duration: 6254000 > 2011-05-13 23:13:28,032 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-9149862728087355005_3793421 src: /192.168.28.210:41197 dest: /192.168.28.214:50050 of size 245767 > 2011-05-13 23:13:28,033 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9132067116195286882_130888 unfinalized and removed. > 2011-05-13 23:13:28,033 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9132067116195286882_130888 received exception java.io.EOFException: while trying to read 3744913 bytes The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files. |