Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Handling regionserver crashes in production cluster


+
kiran 2013-06-05, 17:20
+
Anoop John 2013-06-06, 04:18
+
Jean-Marc Spaggiari 2013-06-06, 14:57
+
kiran 2013-06-12, 12:10
+
Nicolas Liochon 2013-06-12, 14:08
+
rajesh babu chintaguntla 2013-06-12, 15:20
+
kiran 2013-06-12, 16:11
+
Nicolas Liochon 2013-06-12, 17:19
+
kiran 2013-06-13, 03:43
Copy link to this message
-
Re: Handling regionserver crashes in production cluster
Hum... So even a simple get shows the issue?
It would be a (surprising) critical bug. Could you please try the 95.1 or
the 94.8? Or write an unit test?

Thanks,

Nicolas
On Thu, Jun 13, 2013 at 5:43 AM, kiran <[EMAIL PROTECTED]> wrote:

> Its a simple kill...
> Scan is used using startrow and stoprow
> Scan scan = new Scan(Bytes.toBytes("adidas"), Bytes.toBytes("adidas1"));
>
>
> Our cluster size is 15. The load average when I see in master is 78%...It
> is not that overloaded. but writes are happening in the cluster...
>
> Thanks
> Kiran
>
>
>
> On Wed, Jun 12, 2013 at 10:49 PM, Nicolas Liochon <[EMAIL PROTECTED]>
> wrote:
>
> > Yeah, it should not block the other regions.
> >
> > For the region server, was it a kill -9 or in simple kill (the former
> > triggers a recovery, the later will close the region before stopping the
> > process)?
> >
> > How do you select the scan scope? With stop/start rows?
> > Can you share the client code you're using?
> > What's the cluster size? Was it already very loaded before you killed the
> > region server?
> >
> > Nicolas
> >
> >
> >
> > On Wed, Jun 12, 2013 at 6:11 PM, kiran <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Yes we killed the region server but datanode is still running on the
> > > node...
> > >
> > > Sample Test scenario: Assume, I have table with pre-splits a upto z
> > (about
> > > 26 regions). I brought down region server purposefully with regions
> > having
> > > prefixes c and d. Then I used client API to scan data from regions with
> > > prefixes other than c and d. The response was very slow and sometimes
> not
> > > coming at all.
> > >
> > > My doubt was if only regions with prefix c and d are getting relocated
> or
> > > in transition. Why is it affecting the regions with other prefixes....
> > But
> > > once the region transition is over, the response is very fast as
> > expected.
> > >
> > >
> > >
> > > On Wed, Jun 12, 2013 at 8:50 PM, rajesh babu chintaguntla <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > You can configure below to more value to close more regions at a
> time.
> > > >
> > > >  <property>
> > > >     <name>hbase.regionserver.executor.closeregion.threads</name>
> > > >     <value>3</value>
> > > >   </property>
> > > >
> > > >
> > > > On Wed, Jun 12, 2013 at 7:38 PM, Nicolas Liochon <[EMAIL PROTECTED]>
> > > > wrote:
> > > >
> > > > > What was your test exactly? You killed -9 a region server but kept
> > the
> > > > > datanode alive?
> > > > > Could you detail the queries you were doing?
> > > > >
> > > > >
> > > > > On Wed, Jun 12, 2013 at 2:10 PM, kiran <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > It is not possible for us to migrate to new version immediately.
> > > > > >
> > > > > > @Anoop we purposefully brought down one regionserver, then we
> > > observed
> > > > > the
> > > > > > website is taking too much time to respond. We observed the
> pattern
> > > for
> > > > > > about 5 min till the regions are relocated.
> > > > > > Also we issued queries in our website taking care that the
> queries
> > > did
> > > > > n't
> > > > > > come under the regions in the regionserver we brought down.
> > > > > >
> > > > > > Is there any configuration workaround to mitigate it??
> > > > > >
> > > > > > Thanks
> > > > > > Kiran
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Jun 6, 2013 at 8:27 PM, Jean-Marc Spaggiari <
> > > > > > [EMAIL PROTECTED]
> > > > > > > wrote:
> > > > > >
> > > > > > > Hi Kiran,
> > > > > > >
> > > > > > > Also, any chance for you to migrate to 0.94.8? There have been
> > > > > > > hundreds of fixes since 0.94.1...
> > > > > > >
> > > > > > > JM
> > > > > > >
> > > > > > > 2013/6/6 Anoop John <[EMAIL PROTECTED]>:
> > > > > > > > How many total RS in the cluster?  You mean u can not do any
> > > > > operation
> > > > > > on
> > > > > > > > other regions in the live clusters?  It should not happen..
>  Is
> > > it
> > > > so
> > > > > > > > happening that the client ops are targetted at the regions
+
Sandeep L 2013-09-02, 11:23