Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Handling regionserver crashes in production cluster


+
kiran 2013-06-05, 17:20
+
Anoop John 2013-06-06, 04:18
+
Jean-Marc Spaggiari 2013-06-06, 14:57
+
kiran 2013-06-12, 12:10
+
Nicolas Liochon 2013-06-12, 14:08
+
rajesh babu chintaguntla 2013-06-12, 15:20
+
kiran 2013-06-12, 16:11
Copy link to this message
-
Re: Handling regionserver crashes in production cluster
Nicolas Liochon 2013-06-12, 17:19
Yeah, it should not block the other regions.

For the region server, was it a kill -9 or in simple kill (the former
triggers a recovery, the later will close the region before stopping the
process)?

How do you select the scan scope? With stop/start rows?
Can you share the client code you're using?
What's the cluster size? Was it already very loaded before you killed the
region server?

Nicolas

On Wed, Jun 12, 2013 at 6:11 PM, kiran <[EMAIL PROTECTED]> wrote:

> Yes we killed the region server but datanode is still running on the
> node...
>
> Sample Test scenario: Assume, I have table with pre-splits a upto z (about
> 26 regions). I brought down region server purposefully with regions having
> prefixes c and d. Then I used client API to scan data from regions with
> prefixes other than c and d. The response was very slow and sometimes not
> coming at all.
>
> My doubt was if only regions with prefix c and d are getting relocated or
> in transition. Why is it affecting the regions with other prefixes.... But
> once the region transition is over, the response is very fast as expected.
>
>
>
> On Wed, Jun 12, 2013 at 8:50 PM, rajesh babu chintaguntla <
> [EMAIL PROTECTED]> wrote:
>
> > You can configure below to more value to close more regions at a time.
> >
> >  <property>
> >     <name>hbase.regionserver.executor.closeregion.threads</name>
> >     <value>3</value>
> >   </property>
> >
> >
> > On Wed, Jun 12, 2013 at 7:38 PM, Nicolas Liochon <[EMAIL PROTECTED]>
> > wrote:
> >
> > > What was your test exactly? You killed -9 a region server but kept the
> > > datanode alive?
> > > Could you detail the queries you were doing?
> > >
> > >
> > > On Wed, Jun 12, 2013 at 2:10 PM, kiran <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > It is not possible for us to migrate to new version immediately.
> > > >
> > > > @Anoop we purposefully brought down one regionserver, then we
> observed
> > > the
> > > > website is taking too much time to respond. We observed the pattern
> for
> > > > about 5 min till the regions are relocated.
> > > > Also we issued queries in our website taking care that the queries
> did
> > > n't
> > > > come under the regions in the regionserver we brought down.
> > > >
> > > > Is there any configuration workaround to mitigate it??
> > > >
> > > > Thanks
> > > > Kiran
> > > >
> > > >
> > > >
> > > > On Thu, Jun 6, 2013 at 8:27 PM, Jean-Marc Spaggiari <
> > > > [EMAIL PROTECTED]
> > > > > wrote:
> > > >
> > > > > Hi Kiran,
> > > > >
> > > > > Also, any chance for you to migrate to 0.94.8? There have been
> > > > > hundreds of fixes since 0.94.1...
> > > > >
> > > > > JM
> > > > >
> > > > > 2013/6/6 Anoop John <[EMAIL PROTECTED]>:
> > > > > > How many total RS in the cluster?  You mean u can not do any
> > > operation
> > > > on
> > > > > > other regions in the live clusters?  It should not happen..  Is
> it
> > so
> > > > > > happening that the client ops are targetted at the regions which
> > were
> > > > in
> > > > > > the dead RS( and in transition now)?   Can u have a closer look
> and
> > > > see?
> > > > > > If not pls check the RS threads were they are getting blocked.
> > > > > >
> > > > > > -Anoop-
> > > > > >
> > > > > > On Wed, Jun 5, 2013 at 10:50 PM, kiran <
> > [EMAIL PROTECTED]>
> > > > > wrote:
> > > > > >
> > > > > >> Dear All,
> > > > > >>
> > > > > >> We have production cluster that runs on hbase 0.94.1. The issue
> we
> > > are
> > > > > >> facing is whenever one regionserver goes down, the cluster
> becomes
> > > > > >> unresponsive until all the regions are allocated to another
> > > > > >> regionserver(s). The transition is taking about 3-5 mins and
> > during
> > > > this
> > > > > >> time we are unable to any do client operation on the cluster.
> > > > > >>
> > > > > >> Is there any way we can make the transition to run in
> background ?
> > > > > >>
> > > > > >> Also, it is acceptable for us if the client operations such as
> > scan
> > > or
+
kiran 2013-06-13, 03:43
+
Nicolas Liochon 2013-06-13, 07:00
+
Sandeep L 2013-09-02, 11:23