Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Occasional regionserver crashes following socket errors writing to HDFS


+
Eran Kutner 2012-05-10, 08:17
+
dva 2012-08-30, 06:26
+
dva 2012-08-30, 06:26
+
Stack 2012-08-30, 22:36
+
Stack 2012-05-11, 05:07
+
Igal Shilman 2012-05-10, 09:25
+
Eran Kutner 2012-05-10, 11:33
+
Michel Segel 2012-05-10, 11:53
Copy link to this message
-
Re: Occasional regionserver crashes following socket errors writing to HDFS
Hi Mike,
Not sure I understand the question about the reducer. I'm using a reducer
because my M/R jobs require one and I want to write the result to Hbase.
I have two tables I'm writing two, one is using the default file size
(256MB if I remember correctly) the other one is 512MB.
There are ~700 regions on each server.
Didn't know there is a bandwidth limit, is it on HDFS or HBase? How can it
be configured?

-eran

On Thu, May 10, 2012 at 2:53 PM, Michel Segel <[EMAIL PROTECTED]>wrote:

> Silly question...
> Why are you using a reducer when working w HBase?
>
> Second silly question... What is the max file size of your table that you
> are writing to?
>
> Third silly question... How many regions are on each of your region servers
>
> Fourth silly question ... There is this bandwidth setting... Default is
> 10MB...  Did you modify it?
>
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On May 10, 2012, at 6:33 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
>
> > Thanks Igal, but we already have that setting. These are the relevant
> > setting from hdfs-site.xml :
> >  <property>
> >    <name>dfs.datanode.max.xcievers</name>
> >    <value>65536</value>
> >  </property>
> >  <property>
> >    <name>dfs.datanode.handler.count</name>
> >    <value>10</value>
> >  </property>
> >  <property>
> >    <name>dfs.datanode.socket.write.timeout</name>
> >    <value>0</value>
> >  </property>
> >
> > Other ideas?
> >
> > -eran
> >
> >
> >
> > On Thu, May 10, 2012 at 12:25 PM, Igal Shilman <[EMAIL PROTECTED]> wrote:
> >
> >> Hi Eran,
> >> Do you have: dfs.datanode.socket.write.timeout set in hdfs-site.xml ?
> >> (We have set this to zero in our cluster, which means waiting as long as
> >> necessary for the write to complete)
> >>
> >> Igal.
> >>
> >> On Thu, May 10, 2012 at 11:17 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> >>
> >>> Hi,
> >>> We're seeing occasional regionserver crashes during heavy write
> >> operations
> >>> to Hbase (at the reduce phase of large M/R jobs). I have increased the
> >> file
> >>> descriptors, HDFS xceivers, HDFS threads to the recommended settings
> and
> >>> actually way above.
> >>>
> >>> Here is an example of the HBase log (showing only errors):
> >>>
> >>> 2012-05-10 03:34:54,291 WARN org.apache.hadoop.hdfs.DFSClient:
> >>> DFSOutputStream ResponseProcessor exception  for block
> >>> blk_-8928911185099340956_5189425java.io.IOException: Bad response 1 for
> >>> block blk_-8928911185099340956_5189425 from datanode 10.1.104.6:50010
> >>>       at
> >>>
> >>>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:
> >>> 2986)
> >>>
> >>> 2012-05-10 03:34:54,494 WARN org.apache.hadoop.hdfs.DFSClient:
> >> DataStreamer
> >>> Exception: java.io.InterruptedIOException: Interruped while waiting for
> >> IO
> >>> on channel java.nio.channels.SocketChannel[connected
> >>> local=/10.1.104.9:59642remote=/
> >>> 10.1.104.9:50010]. 0 millis timeout left.
> >>>       at
> >>>
> >>>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
> >>>       at
> >>>
> >>>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
> >>>       at
> >>>
> >>
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
> >>>       at
> >>>
> >>
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
> >>>       at
> >> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> >>>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
> >>>       at
> >>>
> >>>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:
> >>> 2848)
> >>>
> >>> 2012-05-10 03:34:54,531 WARN org.apache.hadoop.hdfs.DFSClient: Error
> >>> Recovery for block blk_-8928911185099340956_5189425 bad datanode[2]
> >>> 10.1.104.6:50010
> >>> 2012-05-10 03:34:54,531 WARN org.apache.hadoop.hdfs.DFSClient: Error
> >>> Recovery for block blk_-8928911185099340956_5189425 in pipeline
+
Michael Segel 2012-05-10, 13:26
+
Dave Revell 2012-05-10, 17:31
+
Michael Segel 2012-05-10, 18:30
+
Dave Revell 2012-05-10, 18:41
+
Michael Segel 2012-05-10, 18:59
+
Eran Kutner 2012-05-10, 19:17
+
Michael Segel 2012-05-10, 19:50
+
Stack 2012-05-10, 21:57
+
Michael Segel 2012-05-11, 02:46
+
Stack 2012-05-11, 03:34
+
Michael Segel 2012-05-11, 01:28
+
Stack 2012-05-11, 03:28
+
Michael Segel 2012-05-11, 03:44
+
Stack 2012-05-11, 03:53
+
Stack 2012-05-11, 05:12
+
Michael Segel 2012-05-11, 11:36
+
Eran Kutner 2012-05-24, 11:15
+
Michael Segel 2012-05-24, 12:13
+
Stack 2012-05-24, 23:39
+
Dave Revell 2012-05-25, 19:52
+
Stack 2012-05-11, 05:08
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB