Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Handling regionserver crashes in production cluster


Copy link to this message
-
RE: Handling regionserver crashes in production cluster
Even we are facing same problem, is it fixed in hbase 0.94.8 or 0.97.6 ?
If it is fixed we will migrate, can some one conform about this?
Thanks,Sandeep.

> From: [EMAIL PROTECTED]
> Date: Thu, 13 Jun 2013 09:00:46 +0200
> Subject: Re: Handling regionserver crashes in production cluster
> To: [EMAIL PROTECTED]
>
> Hum... So even a simple get shows the issue?
> It would be a (surprising) critical bug. Could you please try the 95.1 or
> the 94.8? Or write an unit test?
>
> Thanks,
>
> Nicolas
>
>
> On Thu, Jun 13, 2013 at 5:43 AM, kiran <[EMAIL PROTECTED]> wrote:
>
> > Its a simple kill...
> > Scan is used using startrow and stoprow
> > Scan scan = new Scan(Bytes.toBytes("adidas"), Bytes.toBytes("adidas1"));
> >
> >
> > Our cluster size is 15. The load average when I see in master is 78%...It
> > is not that overloaded. but writes are happening in the cluster...
> >
> > Thanks
> > Kiran
> >
> >
> >
> > On Wed, Jun 12, 2013 at 10:49 PM, Nicolas Liochon <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Yeah, it should not block the other regions.
> > >
> > > For the region server, was it a kill -9 or in simple kill (the former
> > > triggers a recovery, the later will close the region before stopping the
> > > process)?
> > >
> > > How do you select the scan scope? With stop/start rows?
> > > Can you share the client code you're using?
> > > What's the cluster size? Was it already very loaded before you killed the
> > > region server?
> > >
> > > Nicolas
> > >
> > >
> > >
> > > On Wed, Jun 12, 2013 at 6:11 PM, kiran <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Yes we killed the region server but datanode is still running on the
> > > > node...
> > > >
> > > > Sample Test scenario: Assume, I have table with pre-splits a upto z
> > > (about
> > > > 26 regions). I brought down region server purposefully with regions
> > > having
> > > > prefixes c and d. Then I used client API to scan data from regions with
> > > > prefixes other than c and d. The response was very slow and sometimes
> > not
> > > > coming at all.
> > > >
> > > > My doubt was if only regions with prefix c and d are getting relocated
> > or
> > > > in transition. Why is it affecting the regions with other prefixes....
> > > But
> > > > once the region transition is over, the response is very fast as
> > > expected.
> > > >
> > > >
> > > >
> > > > On Wed, Jun 12, 2013 at 8:50 PM, rajesh babu chintaguntla <
> > > > [EMAIL PROTECTED]> wrote:
> > > >
> > > > > You can configure below to more value to close more regions at a
> > time.
> > > > >
> > > > >  <property>
> > > > >     <name>hbase.regionserver.executor.closeregion.threads</name>
> > > > >     <value>3</value>
> > > > >   </property>
> > > > >
> > > > >
> > > > > On Wed, Jun 12, 2013 at 7:38 PM, Nicolas Liochon <[EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > What was your test exactly? You killed -9 a region server but kept
> > > the
> > > > > > datanode alive?
> > > > > > Could you detail the queries you were doing?
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 12, 2013 at 2:10 PM, kiran <
> > [EMAIL PROTECTED]>
> > > > > > wrote:
> > > > > >
> > > > > > > It is not possible for us to migrate to new version immediately.
> > > > > > >
> > > > > > > @Anoop we purposefully brought down one regionserver, then we
> > > > observed
> > > > > > the
> > > > > > > website is taking too much time to respond. We observed the
> > pattern
> > > > for
> > > > > > > about 5 min till the regions are relocated.
> > > > > > > Also we issued queries in our website taking care that the
> > queries
> > > > did
> > > > > > n't
> > > > > > > come under the regions in the regionserver we brought down.
> > > > > > >
> > > > > > > Is there any configuration workaround to mitigate it??
> > > > > > >
> > > > > > > Thanks
> > > > > > > Kiran
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jun 6, 2013 at 8:27 PM, Jean-Marc Spaggiari <
> > > > > > > [EMAIL PROTECTED]
>      
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB