Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Region servers going down under heavy write load


Copy link to this message
-
Re: Region servers going down under heavy write load
In zoo.cfg I have not setup this value explicitly. My zoo.cfg looks like:

tickTime=2000
initLimit=10
syncLimit=5

We use common zoo keeper cluster for 2 of our HBase clusters. I'll try
increasing this value from zoo.cfg.
However is it possible to set this value cluster specific?
I thought this property in hbase-site.xml takes care of that:
zookeeper.session.timeout
On Wed, Jun 5, 2013 at 1:49 PM, Kevin O'dell <[EMAIL PROTECTED]>wrote:

> Ameya,
>
>   What does your zoo.cfg say for your timeout value?
>
>
> On Wed, Jun 5, 2013 at 4:47 PM, Ameya Kantikar <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > We have heavy map reduce write jobs running against our cluster. Every
> once
> > in a while, we see a region server going down.
> >
> > We are on : 0.94.2-cdh4.2.0, r
> >
> > We have done some tuning for heavy map reduce jobs, and have increased
> > scanner timeouts, lease timeouts, have also tuned memstore as follows:
> >
> > hbase.hregion.memstore.block.multiplier: 4
> > hbase.hregion.memstore.flush.size: 134217728
> > hbase.hstore.blockingStoreFiles: 100
> >
> > So now, we are still facing issues. Looking at the logs it looks like due
> > to zoo keeper timeout. We have tuned zookeeper settings as follows on
> > hbase-sie.xml:
> >
> > zookeeper.session.timeout: 300000
> > hbase.zookeeper.property.tickTime: 6000
> >
> >
> > The actual log looks like:
> >
> >
> > 2013-06-05 11:46:40,405 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":13468,"call":"next(6723331143689528698, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.73.65:41721
> >
> >
> ","starttimems":1370432786933,"queuetimems":1,"class":"HRegionServer","responsesize":39611416,"method":"next"}
> >
> > 2013-06-05 11:46:54,988 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new decompressor [.snappy]
> >
> > 2013-06-05 11:48:03,017 WARN org.apache.hadoop.hdfs.DFSClient:
> > DFSOutputStream ResponseProcessor exception  for block
> > BP-53741567-10.20.73.56-1351630463427:blk_9026156240355850298_8775246
> > java.io.EOFException: Premature EOF: no length prefix available
> >         at
> >
> >
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
> >         at
> >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)
> >         at
> >
> >
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656)
> >
> > 2013-06-05 11:48:03,020 WARN org.apache.hadoop.hbase.util.Sleeper: *We
> > slept 48686ms instead of 3000ms*, this is likely due to a long garbage
> > collecting pause and it's usually bad, see
> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> >
> > 2013-06-05 11:48:03,094 FATAL
> > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> server
> > smartdeals-hbase14-snc1.snc1,60020,1370373396890: Unhandled exception:
> > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > currently processing smartdeals-hbase14-snc1.snc1,60020,1370373396890 as
> > dead server
> >
> > (Not sure why it says 3000ms when we have timeout at 300000ms)
> >
> > We have done some GC tuning as well. Wondering what I can tune from
> making
> > RS going down? Any ideas?
> > This is batch heavy cluster, and we care less about read latency. We can
> > increase RAM bit more but not much (Already RS has 20GB memory)
> >
> > Thanks in advance.
> >
> > Ameya
> >
>
>
>
> --
> Kevin O'Dell
> Systems Engineer, Cloudera
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB