|
Evert Lammerts
2011-03-09, 11:27
Marcos Ortiz
2011-03-09, 15:31
Evert Lammerts
2011-03-09, 16:09
Marcos Ortiz
2011-03-09, 16:58
|
-
Could not obtain blockEvert Lammerts 2011-03-09, 11:27
We see a lot of IOExceptions coming from HDFS during a job that does nothing but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB) that are in HDFS, to HDFS. DataNodes are also showing Exceptions that I think are related. (See stacktraces below.)
This job should not be able to overload the system I think... I realize that much data needs to go over the lines, but HDFS should still be responsive. Any ideas / help is much appreciated! Some details: * Hadoop 0.20.2 (CDH3b4) * 5 node cluster plus 1 node for JT/NN (Sun Thumpers) * 4 cores/node, 4GB RAM/core * CentOS 5.5 Job output: java.io.IOException: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:449) at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:1) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324) at org.apache.hadoop.mapred.Child$4.run(Child.java:240) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:234) Caused by: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1784) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1932) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:335) at ilps.DownloadICWSM$CopyThread.run(DownloadICWSM.java:149) Example DataNode Exceptions (not that these come from the node at 192.168.28.211): 2011-03-08 19:40:40,297 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9222067946733189014_3798233 java.io.EOFException: while trying to read 3067064 bytes 2011-03-08 19:40:41,018 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.211:50050, dest: /192.168.28.211:49748, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201103071120_0030_m_000032_0, offset: 30 72, srvID: DS-568746059-145.100.2.180-50050-1291128670510, blockid: blk_3596618013242149887_4060598, duration: 2632000 2011-03-08 19:40:41,049 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221028436071074510_2325937 java.io.EOFException: while trying to read 2206400 bytes 2011-03-08 19:40:41,348 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221549395563181322_4024529 java.io.EOFException: while trying to read 3037288 bytes 2011-03-08 19:40:41,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221885906633018147_3895876 java.io.EOFException: while trying to read 1981952 bytes 2011-03-08 19:40:41,434 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221885906633018147_3895876 unfinalized and removed. 2011-03-08 19:40:41,434 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221885906633018147_3895876 received exception java.io.EOFException: while trying to read 1981952 bytes 2011-03-08 19:40:41,434 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075, ipcPort=50020):DataXceiver java.io.EOFException: while trying to read 1981952 bytes at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122) 2011-03-08 19:40:41,465 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221549395563181322_4024529 unfinalized and removed. 2011-03-08 19:40:41,466 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221549395563181322_4024529 received exception java.io.EOFException: while trying to read 3037288 bytes 2011-03-08 19:40:41,466 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075, ipcPort=50020):DataXceiver java.io.EOFException: while trying to read 3037288 bytes at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122) Cheers, Evert Lammerts Consultant eScience & Cloud Services SARA Computing & Network Services Operations, Support & Development Phone: +31 20 888 4101 Email: [EMAIL PROTECTED] http://www.sara.nl
-
Re: Could not obtain blockMarcos Ortiz 2011-03-09, 15:31
El 3/9/2011 6:27 AM, Evert Lammerts escribi�:
> We see a lot of IOExceptions coming from HDFS during a job that does nothing but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB) that are in HDFS, to HDFS. DataNodes are also showing Exceptions that I think are related. (See stacktraces below.) > > This job should not be able to overload the system I think... I realize that much data needs to go over the lines, but HDFS should still be responsive. Any ideas / help is much appreciated! > > Some details: > * Hadoop 0.20.2 (CDH3b4) > * 5 node cluster plus 1 node for JT/NN (Sun Thumpers) > * 4 cores/node, 4GB RAM/core > * CentOS 5.5 > > Job output: > > java.io.IOException: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz > Which is the ouput of: bin/hadoop dfsadmin -report Which is the output of: bin/hadoop fsck /user/emeij/icwsm-data-test/ > at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:449) > at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:1) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324) > at org.apache.hadoop.mapred.Child$4.run(Child.java:240) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > at org.apache.hadoop.mapred.Child.main(Child.java:234) > Caused by: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz > Which is the ouput of: bin/hadoop fsck /user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz --files -blocks -racks > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1784) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1932) > at java.io.DataInputStream.read(DataInputStream.java:83) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:335) > at ilps.DownloadICWSM$CopyThread.run(DownloadICWSM.java:149) > > > Example DataNode Exceptions (not that these come from the node at 192.168.28.211): > > 2011-03-08 19:40:40,297 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9222067946733189014_3798233 java.io.EOFException: while trying to read 3067064 bytes > 2011-03-08 19:40:41,018 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.211:50050, dest: /192.168.28.211:49748, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201103071120_0030_m_000032_0, offset: 30 > 72, srvID: DS-568746059-145.100.2.180-50050-1291128670510, blockid: blk_3596618013242149887_4060598, duration: 2632000 > 2011-03-08 19:40:41,049 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221028436071074510_2325937 java.io.EOFException: while trying to read 2206400 bytes > 2011-03-08 19:40:41,348 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221549395563181322_4024529 java.io.EOFException: while trying to read 3037288 bytes > 2011-03-08 19:40:41,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221885906633018147_3895876 java.io.EOFException: while trying to read 1981952 bytes > 2011-03-08 19:40:41,434 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221885906633018147_3895876 unfinalized and removed. > 2011-03-08 19:40:41,434 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221885906633018147_3895876 received exception java.io.EOFException: while trying to read 1981952 bytes Then on the DataNode where you have the particular block (blk_-3695352030358969086_130839 ) you can visit the web interface http://192.168.28.211:50075/blockScannerReport to see what's happening on the node Regards Marcos Lu�s Ort�z Valmaseda Software Engineer Universidad de las Ciencias Inform�ticas Linux User # 418229 http://uncubanitolinuxero.blogspot.com http://www.linkedin.com/in/marcosluis2186
-
RE: Could not obtain blockEvert Lammerts 2011-03-09, 16:09
I didn't mention it but the complete filesystem is reported healthy by fsck. I'm guessing that the java.io.EOFException indicates a problem caused by the load of the job.
Any ideas? ________________________________________ From: Marcos Ortiz [[EMAIL PROTECTED]] Sent: Wednesday, March 09, 2011 4:31 PM To: [EMAIL PROTECTED] Cc: Evert Lammerts; '[EMAIL PROTECTED]'; [EMAIL PROTECTED] Subject: Re: Could not obtain block El 3/9/2011 6:27 AM, Evert Lammerts escribió: > We see a lot of IOExceptions coming from HDFS during a job that does nothing but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB) that are in HDFS, to HDFS. DataNodes are also showing Exceptions that I think are related. (See stacktraces below.) > > This job should not be able to overload the system I think... I realize that much data needs to go over the lines, but HDFS should still be responsive. Any ideas / help is much appreciated! > > Some details: > * Hadoop 0.20.2 (CDH3b4) > * 5 node cluster plus 1 node for JT/NN (Sun Thumpers) > * 4 cores/node, 4GB RAM/core > * CentOS 5.5 > > Job output: > > java.io.IOException: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz > Which is the ouput of: bin/hadoop dfsadmin -report Which is the output of: bin/hadoop fsck /user/emeij/icwsm-data-test/ > at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:449) > at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:1) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324) > at org.apache.hadoop.mapred.Child$4.run(Child.java:240) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > at org.apache.hadoop.mapred.Child.main(Child.java:234) > Caused by: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz > Which is the ouput of: bin/hadoop fsck /user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz --files -blocks -racks > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1784) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1932) > at java.io.DataInputStream.read(DataInputStream.java:83) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:335) > at ilps.DownloadICWSM$CopyThread.run(DownloadICWSM.java:149) > > > Example DataNode Exceptions (not that these come from the node at 192.168.28.211): > > 2011-03-08 19:40:40,297 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9222067946733189014_3798233 java.io.EOFException: while trying to read 3067064 bytes > 2011-03-08 19:40:41,018 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.211:50050, dest: /192.168.28.211:49748, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201103071120_0030_m_000032_0, offset: 30 > 72, srvID: DS-568746059-145.100.2.180-50050-1291128670510, blockid: blk_3596618013242149887_4060598, duration: 2632000 > 2011-03-08 19:40:41,049 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221028436071074510_2325937 java.io.EOFException: while trying to read 2206400 bytes > 2011-03-08 19:40:41,348 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221549395563181322_4024529 java.io.EOFException: while trying to read 3037288 bytes Then on the DataNode where you have the particular block (blk_-3695352030358969086_130839 ) you can visit the web interface http://192.168.28.211:50075/blockScannerReport to see what's happening on the node Regards Marcos Luís Ortíz Valmaseda Software Engineer Universidad de las Ciencias Informáticas Linux User # 418229 http://uncubanitolinuxero.blogspot.com http://www.linkedin.com/in/marcosluis2186
-
Re: Could not obtain blockMarcos Ortiz 2011-03-09, 16:58
El 3/9/2011 11:09 AM, Evert Lammerts escribi�:
> I didn't mention it but the complete filesystem is reported healthy by fsck. I'm guessing that the java.io.EOFException indicates a problem caused by the load of the job. > > Any ideas? > > It's a very tricky work to debug a MapReduce Job execution but I'll try. java.io.EOFException: while trying to read 1981952 bytes > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122) > 2011-03-08 19:40:41,465 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221549395563181322_4024529 unfinalized and removed. 1- Did you check this? 2- Which are the file permisions on /user/emeij/icwsm-data-test/ ? If the fsck command gives that all is fine, really I don't know more. Regards -- Marcos Lu�s Ort�z Valmaseda Software Engineer Universidad de las Ciencias Inform�ticas Linux User # 418229 http://uncubanitolinuxero.blogspot.com http://www.linkedin.com/in/marcosluis2186 |