Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Occasional regionserver crashes following socket errors writing to HDFS


Copy link to this message
-
Occasional regionserver crashes following socket errors writing to HDFS
Hi,
We're seeing occasional regionserver crashes during heavy write operations
to Hbase (at the reduce phase of large M/R jobs). I have increased the file
descriptors, HDFS xceivers, HDFS threads to the recommended settings and
actually way above.

Here is an example of the HBase log (showing only errors):

2012-05-10 03:34:54,291 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
blk_-8928911185099340956_5189425java.io.IOException: Bad response 1 for
block blk_-8928911185099340956_5189425 from datanode 10.1.104.6:50010
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2986)

2012-05-10 03:34:54,494 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: java.io.InterruptedIOException: Interruped while waiting for IO
on channel java.nio.channels.SocketChannel[connected
local=/10.1.104.9:59642remote=/
10.1.104.9:50010]. 0 millis timeout left.
        at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2848)

2012-05-10 03:34:54,531 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-8928911185099340956_5189425 bad datanode[2]
10.1.104.6:50010
2012-05-10 03:34:54,531 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-8928911185099340956_5189425 in pipeline
10.1.104.9:50010, 10.1.104.8:50010, 10.1.104.6:50010: bad datanode
10.1.104.6:50010
2012-05-10 03:48:30,174 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
serverName=hadoop1-s09.farm-ny.gigya.com,60020,1336476100422,
load=(requests=15741, regions=789, usedHeap=6822, maxHeap=7983):
regionserver:60020-0x2372c0e8a2f0008 regionserver:60020-0x2372c0e8a2f0008
received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
        at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:352)
        at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:270)
        at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
        at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
java.io.InterruptedIOException: Aborting compaction of store properties in
region
gs_users,6155551|QoCW/euBIKuMat/nRC5Xtw==,1334983658004.878522ea91f41cd76b903ea06ccd17f9.
because user requested stop.
        at
org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:998)
        at
org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:779)
        at
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:776)
        at
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:721)
        at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:81)
This is from 10.1.104.9 (same machine running the region server that
crashed):
2012-05-10 03:31:16,785 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-8928911185099340956_5189425 src: /10.1.104.9:59642 dest: /
10.1.104.9:50010
2012-05-10 03:35:39,000 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
Connection reset
2012-05-10 03:35:39,052 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
for block blk_-8928911185099340956_5189425
java.nio.channels.ClosedByInterruptException
2012-05-10 03:35:39,053 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-8928911185099340956_5189425 received exception java.io.IOException:
Interrupted receiveBlock
2012-05-10 03:35:39,055 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hdfs (auth:SIMPLE) cause:java.io.IOException: Block
blk_-8928911185099340956_5189425 length is 24384000 does not match block
file length 24449024
2012-05-10 03:35:39,055 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 50020, call
startBlockRecovery(blk_-8928911185099340956_5189425) from 10.1.104.8:50251:
error: java.io.IOException: Block blk_-8928911185099340956_5189425 length
is 24384000 does not match block file length 24449024
java.io.IOException: Block blk_-8928911185099340956_5189425 length is
24384000 does not match block file length 24449024
2012-05-10 03:35:39,077 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
Broken pipe
2012-05-10 03:35:39,077 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
Socket closed
2012-05-10 03:35:39,108 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
Socket closed
2012-05-10 03:35:39,136 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
Socket closed
2012-05-10 03:35:39,165 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
Socket closed
2012-05-10 03:35:39,196 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
Socket closed
2012-05-10 03:35:39,221 INFO
org.apache.hadoop.hdfs.server
+
dva 2012-08-30, 06:26
+
dva 2012-08-30, 06:26
+
Stack 2012-08-30, 22:36
+
Stack 2012-05-11, 05:07
+
Igal Shilman 2012-05-10, 09:25
+
Eran Kutner 2012-05-10, 11:33
+
Michel Segel 2012-05-10, 11:53
+
Eran Kutner 2012-05-10, 12:22
+
Michael Segel 2012-05-10, 13:26
+
Dave Revell 2012-05-10, 17:31
+
Michael Segel 2012-05-10, 18:30
+
Dave Revell 2012-05-10, 18:41
+
Michael Segel 2012-05-10, 18:59
+
Eran Kutner 2012-05-10, 19:17
+
Michael Segel 2012-05-10, 19:50
+
Stack 2012-05-10, 21:57
+
Michael Segel 2012-05-11, 02:46
+
Stack 2012-05-11, 03:34
+
Michael Segel 2012-05-11, 01:28
+
Stack 2012-05-11, 03:28
+
Michael Segel 2012-05-11, 03:44
+
Stack 2012-05-11, 03:53
+
Stack 2012-05-11, 05:12
+
Michael Segel 2012-05-11, 11:36
+
Eran Kutner 2012-05-24, 11:15
+
Michael Segel 2012-05-24, 12:13
+
Stack 2012-05-24, 23:39
+
Dave Revell 2012-05-25, 19:52
+
Stack 2012-05-11, 05:08
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB