Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Occasional regionserver crashes following socket errors writing to HDFS


+
Eran Kutner 2012-05-10, 08:17
+
dva 2012-08-30, 06:26
+
dva 2012-08-30, 06:26
+
Stack 2012-08-30, 22:36
+
Stack 2012-05-11, 05:07
+
Igal Shilman 2012-05-10, 09:25
+
Eran Kutner 2012-05-10, 11:33
+
Michel Segel 2012-05-10, 11:53
Copy link to this message
-
Re: Occasional regionserver crashes following socket errors writing to HDFS
Hi Mike,
Not sure I understand the question about the reducer. I'm using a reducer
because my M/R jobs require one and I want to write the result to Hbase.
I have two tables I'm writing two, one is using the default file size
(256MB if I remember correctly) the other one is 512MB.
There are ~700 regions on each server.
Didn't know there is a bandwidth limit, is it on HDFS or HBase? How can it
be configured?

-eran

On Thu, May 10, 2012 at 2:53 PM, Michel Segel <[EMAIL PROTECTED]>wrote:

> Silly question...
> Why are you using a reducer when working w HBase?
>
> Second silly question... What is the max file size of your table that you
> are writing to?
>
> Third silly question... How many regions are on each of your region servers
>
> Fourth silly question ... There is this bandwidth setting... Default is
> 10MB...  Did you modify it?
>
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On May 10, 2012, at 6:33 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
>
> > Thanks Igal, but we already have that setting. These are the relevant
> > setting from hdfs-site.xml :
> >  <property>
> >    <name>dfs.datanode.max.xcievers</name>
> >    <value>65536</value>
> >  </property>
> >  <property>
> >    <name>dfs.datanode.handler.count</name>
> >    <value>10</value>
> >  </property>
> >  <property>
> >    <name>dfs.datanode.socket.write.timeout</name>
> >    <value>0</value>
> >  </property>
> >
> > Other ideas?
> >
> > -eran
> >
> >
> >
> > On Thu, May 10, 2012 at 12:25 PM, Igal Shilman <[EMAIL PROTECTED]> wrote:
> >
> >> Hi Eran,
> >> Do you have: dfs.datanode.socket.write.timeout set in hdfs-site.xml ?
> >> (We have set this to zero in our cluster, which means waiting as long as
> >> necessary for the write to complete)
> >>
> >> Igal.
> >>
> >> On Thu, May 10, 2012 at 11:17 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> >>
> >>> Hi,
> >>> We're seeing occasional regionserver crashes during heavy write
> >> operations
> >>> to Hbase (at the reduce phase of large M/R jobs). I have increased the
> >> file
> >>> descriptors, HDFS xceivers, HDFS threads to the recommended settings
> and
> >>> actually way above.
> >>>
> >>> Here is an example of the HBase log (showing only errors):
> >>>
> >>> 2012-05-10 03:34:54,291 WARN org.apache.hadoop.hdfs.DFSClient:
> >>> DFSOutputStream ResponseProcessor exception  for block
> >>> blk_-8928911185099340956_5189425java.io.IOException: Bad response 1 for
> >>> block blk_-8928911185099340956_5189425 from datanode 10.1.104.6:50010
> >>>       at
> >>>
> >>>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:
> >>> 2986)
> >>>
> >>> 2012-05-10 03:34:54,494 WARN org.apache.hadoop.hdfs.DFSClient:
> >> DataStreamer
> >>> Exception: java.io.InterruptedIOException: Interruped while waiting for
> >> IO
> >>> on channel java.nio.channels.SocketChannel[connected
> >>> local=/10.1.104.9:59642remote=/
> >>> 10.1.104.9:50010]. 0 millis timeout left.
> >>>       at
> >>>
> >>>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
> >>>       at
> >>>
> >>>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
> >>>       at
> >>>
> >>
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
> >>>       at
> >>>
> >>
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
> >>>       at
> >> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> >>>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
> >>>       at
> >>>
> >>>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:
> >>> 2848)
> >>>
> >>> 2012-05-10 03:34:54,531 WARN org.apache.hadoop.hdfs.DFSClient: Error
> >>> Recovery for block blk_-8928911185099340956_5189425 bad datanode[2]
> >>> 10.1.104.6:50010
> >>> 2012-05-10 03:34:54,531 WARN org.apache.hadoop.hdfs.DFSClient: Error
> >>> Recovery for block blk_-8928911185099340956_5189425 in pipeline
+
Michael Segel 2012-05-10, 13:26
+
Dave Revell 2012-05-10, 17:31
+
Michael Segel 2012-05-10, 18:30
+
Dave Revell 2012-05-10, 18:41
+
Michael Segel 2012-05-10, 18:59
+
Eran Kutner 2012-05-10, 19:17
+
Michael Segel 2012-05-10, 19:50
+
Stack 2012-05-10, 21:57
+
Michael Segel 2012-05-11, 02:46
+
Stack 2012-05-11, 03:34
+
Michael Segel 2012-05-11, 01:28
+
Stack 2012-05-11, 03:28
+
Michael Segel 2012-05-11, 03:44
+
Stack 2012-05-11, 03:53
+
Stack 2012-05-11, 05:12
+
Michael Segel 2012-05-11, 11:36
+
Eran Kutner 2012-05-24, 11:15
+
Michael Segel 2012-05-24, 12:13
+
Stack 2012-05-24, 23:39
+
Dave Revell 2012-05-25, 19:52
+
Stack 2012-05-11, 05:08