Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Region servers going down under heavy write load


+
Ameya Kantikar 2013-06-05, 20:47
+
Kevin Odell 2013-06-05, 20:49
+
Ameya Kantikar 2013-06-05, 21:34
+
Ted Yu 2013-06-05, 21:45
Copy link to this message
-
Re: Region servers going down under heavy write load
Which tickTime is honored?

One in zoo.cfg or hbase.zookeeper.property.tickTime in hbase-site.xml?

My understanding now is, whichever tickTime is honored, session time can
not be more than 20 times the value.

I think this is whats happening on my cluster:

My hbase.zookeeper.property.tickTime value is 6000 ms. However my timeout
value is 300000 ms which is outside of 20 times tickTime. Hence ZooKeeper
uses its syncLimit of 5, to generate 6000*5 = 30000 as timeout value for my
RS sessions.

I'll try increasing hbase.zookeeper.property.tickTime value in
hbase-site.xml and will monitor my cluster over next few days.

Thanks Kevin & Ted for your help.

Ameya
On Wed, Jun 5, 2013 at 2:45 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> bq. I thought this property in hbase-site.xml takes care of that:
> zookeeper.session.timeout
>
> From
>
> http://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#ch_zkSessions
> :
>
> The client sends a requested timeout, the server responds with the timeout
> that it can give the client. The current implementation requires that the
> timeout be a minimum of 2 times the tickTime (as set in the server
> configuration) and a maximum of 20 times the tickTime. The ZooKeeper client
> API allows access to the negotiated timeout.
> The above means the shared zookeeper quorum may return timeout value
> different from that of zookeeper.session.timeout
>
> Cheers
>
> On Wed, Jun 5, 2013 at 2:34 PM, Ameya Kantikar <[EMAIL PROTECTED]> wrote:
>
> > In zoo.cfg I have not setup this value explicitly. My zoo.cfg looks like:
> >
> > tickTime=2000
> > initLimit=10
> > syncLimit=5
> >
> > We use common zoo keeper cluster for 2 of our HBase clusters. I'll try
> > increasing this value from zoo.cfg.
> > However is it possible to set this value cluster specific?
> > I thought this property in hbase-site.xml takes care of that:
> > zookeeper.session.timeout
> >
> >
> > On Wed, Jun 5, 2013 at 1:49 PM, Kevin O'dell <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Ameya,
> > >
> > >   What does your zoo.cfg say for your timeout value?
> > >
> > >
> > > On Wed, Jun 5, 2013 at 4:47 PM, Ameya Kantikar <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We have heavy map reduce write jobs running against our cluster.
> Every
> > > once
> > > > in a while, we see a region server going down.
> > > >
> > > > We are on : 0.94.2-cdh4.2.0, r
> > > >
> > > > We have done some tuning for heavy map reduce jobs, and have
> increased
> > > > scanner timeouts, lease timeouts, have also tuned memstore as
> follows:
> > > >
> > > > hbase.hregion.memstore.block.multiplier: 4
> > > > hbase.hregion.memstore.flush.size: 134217728
> > > > hbase.hstore.blockingStoreFiles: 100
> > > >
> > > > So now, we are still facing issues. Looking at the logs it looks like
> > due
> > > > to zoo keeper timeout. We have tuned zookeeper settings as follows on
> > > > hbase-sie.xml:
> > > >
> > > > zookeeper.session.timeout: 300000
> > > > hbase.zookeeper.property.tickTime: 6000
> > > >
> > > >
> > > > The actual log looks like:
> > > >
> > > >
> > > > 2013-06-05 11:46:40,405 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":13468,"call":"next(6723331143689528698, 1000),
> rpc
> > > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > > 10.20.73.65:41721
> > > >
> > > >
> > >
> >
> ","starttimems":1370432786933,"queuetimems":1,"class":"HRegionServer","responsesize":39611416,"method":"next"}
> > > >
> > > > 2013-06-05 11:46:54,988 INFO org.apache.hadoop.io.compress.CodecPool:
> > Got
> > > > brand-new decompressor [.snappy]
> > > >
> > > > 2013-06-05 11:48:03,017 WARN org.apache.hadoop.hdfs.DFSClient:
> > > > DFSOutputStream ResponseProcessor exception  for block
> > > > BP-53741567-10.20.73.56-1351630463427:blk_9026156240355850298_8775246
> > > > java.io.EOFException: Premature EOF: no length prefix available
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
+
Ameya Kantikar 2013-06-06, 00:45
+
Ted Yu 2013-06-06, 02:57
+
Stack 2013-06-06, 06:15
+
Stack 2013-06-06, 06:21
+
Ted Yu 2013-06-06, 16:33
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB