|
Mohit Anchlia
2012-04-26, 23:49
Harsh J
2012-04-27, 05:24
Mohit Anchlia
2012-04-27, 14:50
Mohit Anchlia
2012-04-27, 21:36
John George
2012-04-27, 22:12
Mohit Anchlia
2012-04-27, 22:45
Mohit Anchlia
2012-04-29, 20:05
Harsh J
2012-04-29, 20:14
Mohit Anchlia
2012-04-29, 20:25
|
-
DFSClient errorMohit Anchlia 2012-04-26, 23:49
I had 20 mappers in parallel reading 20 gz files and each file around
30-40MB data over 5 hadoop nodes and then writing to the analytics database. Almost midway it started to get this error: 2012-04-26 16:13:53,723 [Thread-8] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 17.18.62.192:50010java.io.IOException: Bad connect ack with firstBadLink as 17.18.62.191:50010 I am trying to look at the logs but doesn't say much. What could be the reason? We are in pretty closed reliable network and all machines are up.
-
Re: DFSClient errorHarsh J 2012-04-27, 05:24
Is only the same IP printed in all such messages? Can you check the DN
log in that machine to see if it reports any form of issues? Also, did your jobs fail or kept going despite these hiccups? I notice you're threading your clients though (?), but I can't tell if that may cause this without further information. On Fri, Apr 27, 2012 at 5:19 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > I had 20 mappers in parallel reading 20 gz files and each file around > 30-40MB data over 5 hadoop nodes and then writing to the analytics > database. Almost midway it started to get this error: > > > 2012-04-26 16:13:53,723 [Thread-8] INFO org.apache.hadoop.hdfs.DFSClient - > Exception in createBlockOutputStream > 17.18.62.192:50010java.io.IOException: Bad connect ack with > firstBadLink as > 17.18.62.191:50010 > > I am trying to look at the logs but doesn't say much. What could be the > reason? We are in pretty closed reliable network and all machines are up. -- Harsh J
-
Re: DFSClient errorMohit Anchlia 2012-04-27, 14:50
On Thu, Apr 26, 2012 at 10:24 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Is only the same IP printed in all such messages? Can you check the DN > log in that machine to see if it reports any form of issues? > > All IPs were logged with this message > Also, did your jobs fail or kept going despite these hiccups? I notice > you're threading your clients though (?), but I can't tell if that may > cause this without further information. > > It started with this error message and slowly all the jobs died with "shortRead" errors. I am not sure about threading. I am using pig script to read .gz file > On Fri, Apr 27, 2012 at 5:19 AM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > > I had 20 mappers in parallel reading 20 gz files and each file around > > 30-40MB data over 5 hadoop nodes and then writing to the analytics > > database. Almost midway it started to get this error: > > > > > > 2012-04-26 16:13:53,723 [Thread-8] INFO org.apache.hadoop.hdfs.DFSClient > - > > Exception in createBlockOutputStream > > 17.18.62.192:50010java.io.IOException: Bad connect ack with > > firstBadLink as > > 17.18.62.191:50010 > > > > I am trying to look at the logs but doesn't say much. What could be the > > reason? We are in pretty closed reliable network and all machines are up. > > > > -- > Harsh J >
-
Re: DFSClient errorMohit Anchlia 2012-04-27, 21:36
I even tried to reduce number of jobs but didn't help. This is what I see:
datanode logs: Initializing secure datanode resources Successfully obtained privileged resources (streaming port ServerSocket[addr=/0.0.0.0,localport=50010] ) (http listener port sun.nio.ch.ServerSocketChannelImpl[/0.0.0.0:50075]) Starting regular datanode initialization 26/04/2012 17:06:51 9858 jsvc.exec error: Service exit with a return value of 143 userlogs: 2012-04-26 19:35:22,801 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is available 2012-04-26 19:35:22,801 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library loaded 2012-04-26 19:35:22,808 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 2012-04-26 19:35:22,903 INFO org.apache.hadoop.hdfs.DFSClient: Failed to connect to /125.18.62.197:50010, add to deadNodes and continue java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:298) at org.apache.hadoop.hdfs.DFSClient$RemoteBlockReader.newBlockReader(DFSClient.java:1664) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.getBlockReader(DFSClient.java:2383) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2056) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2170) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:97) at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:87) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:114) at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:109) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264) 2012-04-26 19:35:22,906 INFO org.apache.hadoop.hdfs.DFSClient: Failed to connect to /125.18.62.204:50010, add to deadNodes and continue java.io.EOFException namenode logs: 2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: Job job_201204261140_0244 added successfully for user 'hadoop' to queue 'default' 2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201204261140_0244 2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.AuditLogger: USER=hadoop IP=125.18.62.196 OPERATION=SUBMIT_JOB TARGET=job_201204261140_0244 RESULT=SUCCESS 2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201204261140_0244 2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 125.18.62.198:50010 java.io.IOException: Bad connect ack with firstBadLink as 125.18.62.197:50010 2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_2499580289951080275_22499 2012-04-26 16:12:53,582 INFO org.apache.hadoop.hdfs.DFSClient: Excluding datanode 125.18.62.197:50010 2012-04-26 16:12:53,594 INFO org.apache.hadoop.mapred.JobInProgress: jobToken generated and stored with users keys in /data/hadoop/mapreduce/job_201204261140_0244/jobToken 2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201204261140_0244 = 73808305. Number of splits = 1 2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201204261140_0244_m_000000 has split on node:/default-rack/ dsdb4.corp.intuit.net 2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201204261140_0244_m_000000 has split on node:/default-rack/ dsdb5.corp.intuit.net 2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: job_201204261140_0244 LOCALITY_WAIT_FACTOR=0.4 2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201204261140_0244 initialized successfully with 1 map tasks and 0 reduce tasks. On Fri, Apr 27, 2012 at 7:50 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
-
Re: DFSClient errorJohn George 2012-04-27, 22:12
Can you run a regular 'hadoop fs' (put orls or get) command?
If yes, how about a wordcount example? '<path>/hadoop jar <path>hadoop-*examples*.jar wordcount input output' -----Original Message----- From: Mohit Anchlia <[EMAIL PROTECTED]> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Date: Fri, 27 Apr 2012 14:36:49 -0700 To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Subject: Re: DFSClient error >I even tried to reduce number of jobs but didn't help. This is what I see: > >datanode logs: > >Initializing secure datanode resources >Successfully obtained privileged resources (streaming port >ServerSocket[addr=/0.0.0.0,localport=50010] ) (http listener port >sun.nio.ch.ServerSocketChannelImpl[/0.0.0.0:50075]) >Starting regular datanode initialization >26/04/2012 17:06:51 9858 jsvc.exec error: Service exit with a return value >of 143 > >userlogs: > >2012-04-26 19:35:22,801 WARN >org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is >available >2012-04-26 19:35:22,801 INFO >org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library >loaded >2012-04-26 19:35:22,808 INFO >org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & >initialized native-zlib library >2012-04-26 19:35:22,903 INFO org.apache.hadoop.hdfs.DFSClient: Failed to >connect to /125.18.62.197:50010, add to deadNodes and continue >java.io.EOFException > at java.io.DataInputStream.readShort(DataInputStream.java:298) > at >org.apache.hadoop.hdfs.DFSClient$RemoteBlockReader.newBlockReader(DFSClien >t.java:1664) > at >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.getBlockReader(DFSClient.j >ava:2383) > at >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java >:2056) > at >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2170) > at java.io.DataInputStream.read(DataInputStream.java:132) > at >org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(Decompr >essorStream.java:97) > at >org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorSt >ream.java:87) > at >org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.j >ava:75) > at java.io.InputStream.read(InputStream.java:85) > at >org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205) > at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169) > at >org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRe >cordReader.java:114) > at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:109) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordRead >er.nextKeyValue(PigRecordReader.java:187) > at >org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT >ask.java:456) > at >org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. >java:1157) > at org.apache.hadoop.mapred.Child.main(Child.java:264) >2012-04-26 19:35:22,906 INFO org.apache.hadoop.hdfs.DFSClient: Failed to >connect to /125.18.62.204:50010, add to deadNodes and continue >java.io.EOFException > >namenode logs: > >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: Job >job_201204261140_0244 added successfully for user 'hadoop' to queue >'default' >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: >Initializing job_201204261140_0244 >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.AuditLogger:
-
Re: DFSClient errorMohit Anchlia 2012-04-27, 22:45
After all the jobs fail I can't run anything. Once I restart the cluster I
am able to run other jobs with no problems, hadoop fs and other io intensive jobs run just fine. On Fri, Apr 27, 2012 at 3:12 PM, John George <[EMAIL PROTECTED]> wrote: > Can you run a regular 'hadoop fs' (put orls or get) command? > If yes, how about a wordcount example? > '<path>/hadoop jar <path>hadoop-*examples*.jar wordcount input output' > > > -----Original Message----- > From: Mohit Anchlia <[EMAIL PROTECTED]> > Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Date: Fri, 27 Apr 2012 14:36:49 -0700 > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Subject: Re: DFSClient error > > >I even tried to reduce number of jobs but didn't help. This is what I see: > > > >datanode logs: > > > >Initializing secure datanode resources > >Successfully obtained privileged resources (streaming port > >ServerSocket[addr=/0.0.0.0,localport=50010] ) (http listener port > >sun.nio.ch.ServerSocketChannelImpl[/0.0.0.0:50075]) > >Starting regular datanode initialization > >26/04/2012 17:06:51 9858 jsvc.exec error: Service exit with a return value > >of 143 > > > >userlogs: > > > >2012-04-26 19:35:22,801 WARN > >org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is > >available > >2012-04-26 19:35:22,801 INFO > >org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library > >loaded > >2012-04-26 19:35:22,808 INFO > >org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & > >initialized native-zlib library > >2012-04-26 19:35:22,903 INFO org.apache.hadoop.hdfs.DFSClient: Failed to > >connect to /125.18.62.197:50010, add to deadNodes and continue > >java.io.EOFException > > at java.io.DataInputStream.readShort(DataInputStream.java:298) > > at > >org.apache.hadoop.hdfs.DFSClient$RemoteBlockReader.newBlockReader(DFSClien > >t.java:1664) > > at > >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.getBlockReader(DFSClient.j > >ava:2383) > > at > >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java > >:2056) > > at > >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2170) > > at java.io.DataInputStream.read(DataInputStream.java:132) > > at > >org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(Decompr > >essorStream.java:97) > > at > >org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorSt > >ream.java:87) > > at > >org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.j > >ava:75) > > at java.io.InputStream.read(InputStream.java:85) > > at > >org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205) > > at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169) > > at > >org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRe > >cordReader.java:114) > > at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:109) > > at > >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordRead > >er.nextKeyValue(PigRecordReader.java:187) > > at > >org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT > >ask.java:456) > > at > >org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at > >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. > >java:1157) > > at org.apache.hadoop.mapred.Child.main(Child.java:264) > >2012-04-26 19:35:22,906 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
-
Re: DFSClient errorMohit Anchlia 2012-04-29, 20:05
I even tried to lower number of parallel jobs even further but I still get
these errors. Any suggestion on how to troubleshoot this issue would be very helpful. Should I run hadoop fsck? How do people troubleshoot such issues?? Does it sound like a bug? 2012-04-27 14:37:42,921 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2012-04-27 14:37:42,931 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.199:50010java.io.EOFException 2012-04-27 14:37:42,932 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_6343044536824463287_24619 2012-04-27 14:37:42,932 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.199:50010 2012-04-27 14:37:42,935 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.204:50010java.io.EOFException 2012-04-27 14:37:42,935 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_2837215798109471362_24620 2012-04-27 14:37:42,936 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.204:50010 2012-04-27 14:37:42,937 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2012-04-27 14:37:42,939 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.198:50010java.io.EOFException 2012-04-27 14:37:42,939 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_2223489090936415027_24620 2012-04-27 14:37:42,940 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.198:50010 2012-04-27 14:37:42,943 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.197:50010java.io.EOFException 2012-04-27 14:37:42,943 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_1265169201875643059_24620 2012-04-27 14:37:42,944 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.197:50010 2012-04-27 14:37:42,945 [Thread-5] WARN org.apache.hadoop.hdfs.DFSClient - DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3446) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2627) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2822) 2012-04-27 14:37:42,945 [Thread-5] WARN org.apache.hadoop.hdfs.DFSClient - Error Recovery for block blk_1265169201875643059_24620 bad datanode[0] nodes == null 2012-04-27 14:37:42,945 [Thread-5] WARN org.apache.hadoop.hdfs.DFSClient - Could not get block locations. Source file "/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201204261707_0411/job.jar" - Aborting... 2012-04-27 14:37:42,945 [Thread-4] INFO org.apache.hadoop.mapred.JobClient - Cleaning up the staging area hdfs://dsdb1:54310/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201204261707_0411 2012-04-27 14:37:42,945 [Thread-4] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.EOFException 2012-04-27 14:37:42,996 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.200:50010java.io.IOException: Bad connect ack with firstBadLink as 125.18.62.198:50010 2012-04-27 14:37:42,996 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_-7583284266913502018_24621 2012-04-27 14:37:42,997 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.198:50010java.io.EOFException 2012-04-27 14:37:42,997 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_4207260385919079785_24622 2012-04-27 14:37:42,998 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.198:50010 2012-04-27 14:37:43,000 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.198:50010 2012-04-27 14:37:43,002 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.197:50010java.io.EOFException 2012-04-27 14:37:43,002 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_-2859304645525022496_24624 2012-04-27 14:37:43,003 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.197:50010 2012-04-27 14:37:43,003 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.198:50010java.io.EOFException 2012-04-27 14:37:43,004 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_-5091361633954135154_24622 2012-04-27 14:37:43,004 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.199:50010java.io.EOFException 2012-04-27 14:37:43,004 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_-1445223397912067500_24624 2012-04-27 14:37:43,005 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.198:50010 2012-04-27 14:37:43,005 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.199:50010 2012-04-27 14:37:43,006 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.204:50010java.io.EOFException 2012-04-27 14:37:43,006 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_4137744363907213546_24624 2012-04-27 14:37:43,007 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode 125.18.62.204:50010 2012-04-27 14:37:43,008 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream 125.18.62.204:50010java.io.EOFException 2012-04-27 14:37:43,008 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - Abandoning block blk_45536925356783765
-
Re: DFSClient errorHarsh J 2012-04-29, 20:14
It sounds to me like you're running out of DN xceivers. Try the
solution offered at http://hbase.apache.org/book.html#dfs.datanode.max.xcievers I.e., add: <property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property> To your DNs' config/hdfs-site.xml and restart the DNs. On Mon, Apr 30, 2012 at 1:35 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > I even tried to lower number of parallel jobs even further but I still get > these errors. Any suggestion on how to troubleshoot this issue would be > very helpful. Should I run hadoop fsck? How do people troubleshoot such > issues?? Does it sound like a bug? > > 2012-04-27 14:37:42,921 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 1 map-reduce job(s) waiting for submission. > 2012-04-27 14:37:42,931 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - > Exception in createBlockOutputStream 125.18.62.199:50010java.io.EOFException > 2012-04-27 14:37:42,932 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - > Abandoning block blk_6343044536824463287_24619 > 2012-04-27 14:37:42,932 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - > Excluding datanode 125.18.62.199:50010 > 2012-04-27 14:37:42,935 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - > Exception in createBlockOutputStream 125.18.62.204:50010java.io.EOFException > 2012-04-27 14:37:42,935 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - > Abandoning block blk_2837215798109471362_24620 > 2012-04-27 14:37:42,936 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - > Excluding datanode 125.18.62.204:50010 > 2012-04-27 14:37:42,937 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 1 map-reduce job(s) waiting for submission. > 2012-04-27 14:37:42,939 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - > Exception in createBlockOutputStream 125.18.62.198:50010java.io.EOFException > 2012-04-27 14:37:42,939 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - > Abandoning block blk_2223489090936415027_24620 > 2012-04-27 14:37:42,940 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - > Excluding datanode 125.18.62.198:50010 > 2012-04-27 14:37:42,943 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - > Exception in createBlockOutputStream 125.18.62.197:50010java.io.EOFException > 2012-04-27 14:37:42,943 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - > Abandoning block blk_1265169201875643059_24620 > 2012-04-27 14:37:42,944 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - > Excluding datanode 125.18.62.197:50010 > 2012-04-27 14:37:42,945 [Thread-5] WARN org.apache.hadoop.hdfs.DFSClient - > DataStreamer Exception: java.io.IOException: Unable to create new block. > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3446) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2627) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2822) > 2012-04-27 14:37:42,945 [Thread-5] WARN org.apache.hadoop.hdfs.DFSClient - > Error Recovery for block blk_1265169201875643059_24620 bad datanode[0] > nodes == null > 2012-04-27 14:37:42,945 [Thread-5] WARN org.apache.hadoop.hdfs.DFSClient - > Could not get block locations. Source file > "/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201204261707_0411/job.jar" > - Aborting... > 2012-04-27 14:37:42,945 [Thread-4] INFO org.apache.hadoop.mapred.JobClient > - Cleaning up the staging area > hdfs://dsdb1:54310/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201204261707_0411 > 2012-04-27 14:37:42,945 [Thread-4] ERROR > org.apache.hadoop.security.UserGroupInformation - > PriviledgedActionException as:hadoop (auth:SIMPLE) > cause:java.io.EOFException > 2012-04-27 14:37:42,996 [Thread-5] INFO org.apache.hadoop.hdfs.DFSClient - > Exception in createBlockOutputStream > 125.18.62.200:50010java.io.IOException: Bad connect ack with Harsh J
-
Re: DFSClient errorMohit Anchlia 2012-04-29, 20:25
Thanks for the quick response, appreciate it. It looks like this might be
the issue. But I am still trying to understand what is causing so many threads in my situation? Is this thread per block that gets created or per file? Because if it's per file then it should not be more than 15. My second question, I read around 5 .gz files in 5 separate processed. This is constant and also the size of those 5 is roughly equivalent. So then why does it fail only halfway and not right in the begining. I am reading around 400 files and it always fails when I reach around 180th file. What's the default value of xceivers? Is 4096 consume too much of stack size? Thanks On Sun, Apr 29, 2012 at 1:14 PM, Harsh J <[EMAIL PROTECTED]> wrote: > It sounds to me like you're running out of DN xceivers. Try the > solution offered at > http://hbase.apache.org/book.html#dfs.datanode.max.xcievers > > I.e., add: > > <property> > <name>dfs.datanode.max.xcievers</name> > <value>4096</value> > </property> > > To your DNs' config/hdfs-site.xml and restart the DNs. > > On Mon, Apr 30, 2012 at 1:35 AM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > > I even tried to lower number of parallel jobs even further but I still > get > > these errors. Any suggestion on how to troubleshoot this issue would be > > very helpful. Should I run hadoop fsck? How do people troubleshoot such > > issues?? Does it sound like a bug? > > > > 2012-04-27 14:37:42,921 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - 1 map-reduce job(s) waiting for submission. > > 2012-04-27 14:37:42,931 [Thread-5] INFO > org.apache.hadoop.hdfs.DFSClient - > > Exception in createBlockOutputStream 125.18.62.199:50010 > java.io.EOFException > > 2012-04-27 14:37:42,932 [Thread-5] INFO > org.apache.hadoop.hdfs.DFSClient - > > Abandoning block blk_6343044536824463287_24619 > > 2012-04-27 14:37:42,932 [Thread-5] INFO > org.apache.hadoop.hdfs.DFSClient - > > Excluding datanode 125.18.62.199:50010 > > 2012-04-27 14:37:42,935 [Thread-5] INFO > org.apache.hadoop.hdfs.DFSClient - > > Exception in createBlockOutputStream 125.18.62.204:50010 > java.io.EOFException > > 2012-04-27 14:37:42,935 [Thread-5] INFO > org.apache.hadoop.hdfs.DFSClient - > > Abandoning block blk_2837215798109471362_24620 > > 2012-04-27 14:37:42,936 [Thread-5] INFO > org.apache.hadoop.hdfs.DFSClient - > > Excluding datanode 125.18.62.204:50010 > > 2012-04-27 14:37:42,937 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - 1 map-reduce job(s) waiting for submission. > > 2012-04-27 14:37:42,939 [Thread-5] INFO > org.apache.hadoop.hdfs.DFSClient - > > Exception in createBlockOutputStream 125.18.62.198:50010 > java.io.EOFException > > 2012-04-27 14:37:42,939 [Thread-5] INFO > org.apache.hadoop.hdfs.DFSClient - > > Abandoning block blk_2223489090936415027_24620 > > 2012-04-27 14:37:42,940 [Thread-5] INFO > org.apache.hadoop.hdfs.DFSClient - > > Excluding datanode 125.18.62.198:50010 > > 2012-04-27 14:37:42,943 [Thread-5] INFO > org.apache.hadoop.hdfs.DFSClient - > > Exception in createBlockOutputStream 125.18.62.197:50010 > java.io.EOFException > > 2012-04-27 14:37:42,943 [Thread-5] INFO > org.apache.hadoop.hdfs.DFSClient - > > Abandoning block blk_1265169201875643059_24620 > > 2012-04-27 14:37:42,944 [Thread-5] INFO > org.apache.hadoop.hdfs.DFSClient - > > Excluding datanode 125.18.62.197:50010 > > 2012-04-27 14:37:42,945 [Thread-5] WARN > org.apache.hadoop.hdfs.DFSClient - > > DataStreamer Exception: java.io.IOException: Unable to create new block. > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3446) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2627) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2822) > > 2012-04-27 14:37:42,945 [Thread-5] WARN > org.apache.hadoop.hdfs.DFSClient - |