Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Occasional regionserver crashes following socket errors writing to HDFS


Copy link to this message
-
Re: Occasional regionserver crashes following socket errors writing to HDFS
Silly question...
Why are you using a reducer when working w HBase?

Second silly question... What is the max file size of your table that you are writing to?

Third silly question... How many regions are on each of your region servers

Fourth silly question ... There is this bandwidth setting... Default is 10MB...  Did you modify it?

Sent from a remote device. Please excuse any typos...

Mike Segel

On May 10, 2012, at 6:33 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:

> Thanks Igal, but we already have that setting. These are the relevant
> setting from hdfs-site.xml :
>  <property>
>    <name>dfs.datanode.max.xcievers</name>
>    <value>65536</value>
>  </property>
>  <property>
>    <name>dfs.datanode.handler.count</name>
>    <value>10</value>
>  </property>
>  <property>
>    <name>dfs.datanode.socket.write.timeout</name>
>    <value>0</value>
>  </property>
>
> Other ideas?
>
> -eran
>
>
>
> On Thu, May 10, 2012 at 12:25 PM, Igal Shilman <[EMAIL PROTECTED]> wrote:
>
>> Hi Eran,
>> Do you have: dfs.datanode.socket.write.timeout set in hdfs-site.xml ?
>> (We have set this to zero in our cluster, which means waiting as long as
>> necessary for the write to complete)
>>
>> Igal.
>>
>> On Thu, May 10, 2012 at 11:17 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>> We're seeing occasional regionserver crashes during heavy write
>> operations
>>> to Hbase (at the reduce phase of large M/R jobs). I have increased the
>> file
>>> descriptors, HDFS xceivers, HDFS threads to the recommended settings and
>>> actually way above.
>>>
>>> Here is an example of the HBase log (showing only errors):
>>>
>>> 2012-05-10 03:34:54,291 WARN org.apache.hadoop.hdfs.DFSClient:
>>> DFSOutputStream ResponseProcessor exception  for block
>>> blk_-8928911185099340956_5189425java.io.IOException: Bad response 1 for
>>> block blk_-8928911185099340956_5189425 from datanode 10.1.104.6:50010
>>>       at
>>>
>>>
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:
>>> 2986)
>>>
>>> 2012-05-10 03:34:54,494 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer
>>> Exception: java.io.InterruptedIOException: Interruped while waiting for
>> IO
>>> on channel java.nio.channels.SocketChannel[connected
>>> local=/10.1.104.9:59642remote=/
>>> 10.1.104.9:50010]. 0 millis timeout left.
>>>       at
>>>
>>>
>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
>>>       at
>>>
>>>
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
>>>       at
>>>
>> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>>>       at
>>>
>> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>>>       at
>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>>>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>>       at
>>>
>>>
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:
>>> 2848)
>>>
>>> 2012-05-10 03:34:54,531 WARN org.apache.hadoop.hdfs.DFSClient: Error
>>> Recovery for block blk_-8928911185099340956_5189425 bad datanode[2]
>>> 10.1.104.6:50010
>>> 2012-05-10 03:34:54,531 WARN org.apache.hadoop.hdfs.DFSClient: Error
>>> Recovery for block blk_-8928911185099340956_5189425 in pipeline
>>> 10.1.104.9:50010, 10.1.104.8:50010, 10.1.104.6:50010: bad datanode
>>> 10.1.104.6:50010
>>> 2012-05-10 03:48:30,174 FATAL
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>> server
>>> serverName=hadoop1-s09.farm-ny.gigya.com,60020,1336476100422,
>>> load=(requests=15741, regions=789, usedHeap=6822, maxHeap=7983):
>>> regionserver:60020-0x2372c0e8a2f0008 regionserver:60020-0x2372c0e8a2f0008
>>> received expired from ZooKeeper, aborting
>>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>>> KeeperErrorCode = Session expired
>>>       at
>>>
>>>
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:352)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB