Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Region servers going down under heavy write load


+
Ameya Kantikar 2013-06-05, 20:47
+
Kevin Odell 2013-06-05, 20:49
+
Ameya Kantikar 2013-06-05, 21:34
+
Ted Yu 2013-06-05, 21:45
Copy link to this message
-
Re: Region servers going down under heavy write load
Which tickTime is honored?

One in zoo.cfg or hbase.zookeeper.property.tickTime in hbase-site.xml?

My understanding now is, whichever tickTime is honored, session time can
not be more than 20 times the value.

I think this is whats happening on my cluster:

My hbase.zookeeper.property.tickTime value is 6000 ms. However my timeout
value is 300000 ms which is outside of 20 times tickTime. Hence ZooKeeper
uses its syncLimit of 5, to generate 6000*5 = 30000 as timeout value for my
RS sessions.

I'll try increasing hbase.zookeeper.property.tickTime value in
hbase-site.xml and will monitor my cluster over next few days.

Thanks Kevin & Ted for your help.

Ameya
On Wed, Jun 5, 2013 at 2:45 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> bq. I thought this property in hbase-site.xml takes care of that:
> zookeeper.session.timeout
>
> From
>
> http://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#ch_zkSessions
> :
>
> The client sends a requested timeout, the server responds with the timeout
> that it can give the client. The current implementation requires that the
> timeout be a minimum of 2 times the tickTime (as set in the server
> configuration) and a maximum of 20 times the tickTime. The ZooKeeper client
> API allows access to the negotiated timeout.
> The above means the shared zookeeper quorum may return timeout value
> different from that of zookeeper.session.timeout
>
> Cheers
>
> On Wed, Jun 5, 2013 at 2:34 PM, Ameya Kantikar <[EMAIL PROTECTED]> wrote:
>
> > In zoo.cfg I have not setup this value explicitly. My zoo.cfg looks like:
> >
> > tickTime=2000
> > initLimit=10
> > syncLimit=5
> >
> > We use common zoo keeper cluster for 2 of our HBase clusters. I'll try
> > increasing this value from zoo.cfg.
> > However is it possible to set this value cluster specific?
> > I thought this property in hbase-site.xml takes care of that:
> > zookeeper.session.timeout
> >
> >
> > On Wed, Jun 5, 2013 at 1:49 PM, Kevin O'dell <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Ameya,
> > >
> > >   What does your zoo.cfg say for your timeout value?
> > >
> > >
> > > On Wed, Jun 5, 2013 at 4:47 PM, Ameya Kantikar <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We have heavy map reduce write jobs running against our cluster.
> Every
> > > once
> > > > in a while, we see a region server going down.
> > > >
> > > > We are on : 0.94.2-cdh4.2.0, r
> > > >
> > > > We have done some tuning for heavy map reduce jobs, and have
> increased
> > > > scanner timeouts, lease timeouts, have also tuned memstore as
> follows:
> > > >
> > > > hbase.hregion.memstore.block.multiplier: 4
> > > > hbase.hregion.memstore.flush.size: 134217728
> > > > hbase.hstore.blockingStoreFiles: 100
> > > >
> > > > So now, we are still facing issues. Looking at the logs it looks like
> > due
> > > > to zoo keeper timeout. We have tuned zookeeper settings as follows on
> > > > hbase-sie.xml:
> > > >
> > > > zookeeper.session.timeout: 300000
> > > > hbase.zookeeper.property.tickTime: 6000
> > > >
> > > >
> > > > The actual log looks like:
> > > >
> > > >
> > > > 2013-06-05 11:46:40,405 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":13468,"call":"next(6723331143689528698, 1000),
> rpc
> > > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > > 10.20.73.65:41721
> > > >
> > > >
> > >
> >
> ","starttimems":1370432786933,"queuetimems":1,"class":"HRegionServer","responsesize":39611416,"method":"next"}
> > > >
> > > > 2013-06-05 11:46:54,988 INFO org.apache.hadoop.io.compress.CodecPool:
> > Got
> > > > brand-new decompressor [.snappy]
> > > >
> > > > 2013-06-05 11:48:03,017 WARN org.apache.hadoop.hdfs.DFSClient:
> > > > DFSOutputStream ResponseProcessor exception  for block
> > > > BP-53741567-10.20.73.56-1351630463427:blk_9026156240355850298_8775246
> > > > java.io.EOFException: Premature EOF: no length prefix available
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
+
Ameya Kantikar 2013-06-06, 00:45
+
Ted Yu 2013-06-06, 02:57
+
Stack 2013-06-06, 06:15
+
Stack 2013-06-06, 06:21
+
Ted Yu 2013-06-06, 16:33