Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Recovery from cluster wide failure

Stack 2012-11-30, 20:16
Copy link to this message
Re: Recovery from cluster wide failure
I am honestly a little confused by what Igor's factory/table gets you as it
only seems to be checking if the table is closed and affecting the close()

The way I see it there are 3 things that need to be fixed in order to get
this to work for HTablePool.

1. The pool needs a way to determine if a table is invalid and not add it
back in when consumers call close()
2. HTable/HTablePool needs a way to proactively remove closed/stale
HConnections from HConnectionManager
3. The constructors for HTable that do not take in an HConnection need to
delete the connection when an exception occurs

If the improvements described above are implemented then 2 is already taken
care of as htable.close() deletes the connection and similarly for 3.

1 is the hardest as HTablePool has no concept of an HTable or HConnection.
My original idea was to add a method to HTableInterfaceFactory for checking
if the table is valid and an additional method added to HTable for checking
if the connection is closed or aborted, but even that seems awkward.
On Fri, Nov 30, 2012 at 2:16 PM, Stack <[EMAIL PROTECTED]> wrote:

> On Fri, Nov 30, 2012 at 8:56 AM, Bryan Baugher <[EMAIL PROTECTED]> wrote:
> > Unfortunately it does not seem like HTable or HTablePool have any logic
> to
> > tell the HConnectionManager the connection is stale and I don't believe
> you
> > can rely on all of the clients giving back the connection at the same
> time
> > in order to solve this issue.
> >
> > So I have a couple questions,
> >
> > 1. Since HConnectionImplementation understands if it is being managed or
> > not, would it make sense for it to remove itself from the
> > HConnectionManager cache when abort(String, Throwable) is called via
> > deleteStaleConnection(..)? Notice that the close() method currently does
> > something similar.
> >
> >
> Sounds right, yes.
> > 2. Should HConnectionManager delete connections that are closed/aborted
> and
> > have been passed back to it via the deleteConnection methods?
> >
> >
> Also sounds like the right thing to do.
> > Although I wish I had a junit that could show this, I also believe that a
> > HConnectionImplementation can become aborted during construction. We saw
> > this happening while the cluster services were down, HConnectionManager
> > would retrieve a new HConnection but it would come to us already
> > closed/aborted.
> >
> > There are a couple other issues with HTablePool[1] and dealing with this
> > issue but these behaviors seem like they would need to be addressed
> first.
> >
> > [1] - https://issues.apache.org/jira/browse/HBASE-6956
> >
> >
> What do you think of what Igor pasted into the issue?
> St.Ack

lars hofhansl 2012-12-01, 07:54