Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Region servers going down under heavy write load


Copy link to this message
-
Re: Region servers going down under heavy write load
Ameya Kantikar 2013-06-06, 00:45
One more thing. I just dont find this "hbase.zookeeper.property.tickTime"
anywhere in the code base.
Also, I could not find ZooKeeper API that takes tickTime from client.
http://zookeeper.apache.org/doc/r3.3.3/api/org/apache/zookeeper/ZooKeeper.html
It takes sessionTime out value, but not tickTime.

Is this even relevant anymore? hbase.zookeeper.property.tickTime ?

So whats the solution, increase tickTime in zoo.cfg? (and not
hbase.zookeeper.property.tickTime
in hbase-site.xml?)

Ameya
On Wed, Jun 5, 2013 at 3:18 PM, Ameya Kantikar <[EMAIL PROTECTED]> wrote:

> Which tickTime is honored?
>
> One in zoo.cfg or hbase.zookeeper.property.tickTime in hbase-site.xml?
>
> My understanding now is, whichever tickTime is honored, session time can
> not be more than 20 times the value.
>
> I think this is whats happening on my cluster:
>
> My hbase.zookeeper.property.tickTime value is 6000 ms. However my timeout
> value is 300000 ms which is outside of 20 times tickTime. Hence ZooKeeper
> uses its syncLimit of 5, to generate 6000*5 = 30000 as timeout value for my
> RS sessions.
>
> I'll try increasing hbase.zookeeper.property.tickTime value in
> hbase-site.xml and will monitor my cluster over next few days.
>
> Thanks Kevin & Ted for your help.
>
> Ameya
>
>
>
>
> On Wed, Jun 5, 2013 at 2:45 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> bq. I thought this property in hbase-site.xml takes care of that:
>> zookeeper.session.timeout
>>
>> From
>>
>> http://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#ch_zkSessions
>> :
>>
>> The client sends a requested timeout, the server responds with the timeout
>> that it can give the client. The current implementation requires that the
>> timeout be a minimum of 2 times the tickTime (as set in the server
>> configuration) and a maximum of 20 times the tickTime. The ZooKeeper
>> client
>> API allows access to the negotiated timeout.
>> The above means the shared zookeeper quorum may return timeout value
>> different from that of zookeeper.session.timeout
>>
>> Cheers
>>
>> On Wed, Jun 5, 2013 at 2:34 PM, Ameya Kantikar <[EMAIL PROTECTED]> wrote:
>>
>> > In zoo.cfg I have not setup this value explicitly. My zoo.cfg looks
>> like:
>> >
>> > tickTime=2000
>> > initLimit=10
>> > syncLimit=5
>> >
>> > We use common zoo keeper cluster for 2 of our HBase clusters. I'll try
>> > increasing this value from zoo.cfg.
>> > However is it possible to set this value cluster specific?
>> > I thought this property in hbase-site.xml takes care of that:
>> > zookeeper.session.timeout
>> >
>> >
>> > On Wed, Jun 5, 2013 at 1:49 PM, Kevin O'dell <[EMAIL PROTECTED]
>> > >wrote:
>> >
>> > > Ameya,
>> > >
>> > >   What does your zoo.cfg say for your timeout value?
>> > >
>> > >
>> > > On Wed, Jun 5, 2013 at 4:47 PM, Ameya Kantikar <[EMAIL PROTECTED]>
>> > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > We have heavy map reduce write jobs running against our cluster.
>> Every
>> > > once
>> > > > in a while, we see a region server going down.
>> > > >
>> > > > We are on : 0.94.2-cdh4.2.0, r
>> > > >
>> > > > We have done some tuning for heavy map reduce jobs, and have
>> increased
>> > > > scanner timeouts, lease timeouts, have also tuned memstore as
>> follows:
>> > > >
>> > > > hbase.hregion.memstore.block.multiplier: 4
>> > > > hbase.hregion.memstore.flush.size: 134217728
>> > > > hbase.hstore.blockingStoreFiles: 100
>> > > >
>> > > > So now, we are still facing issues. Looking at the logs it looks
>> like
>> > due
>> > > > to zoo keeper timeout. We have tuned zookeeper settings as follows
>> on
>> > > > hbase-sie.xml:
>> > > >
>> > > > zookeeper.session.timeout: 300000
>> > > > hbase.zookeeper.property.tickTime: 6000
>> > > >
>> > > >
>> > > > The actual log looks like:
>> > > >
>> > > >
>> > > > 2013-06-05 11:46:40,405 WARN org.apache.hadoop.ipc.HBaseServer:
>> > > > (responseTooSlow):
>> > > > {"processingtimems":13468,"call":"next(6723331143689528698, 1000),
>> rpc
>> > > > version=1, client version=29,