-Re: Recovery from cluster wide failure
Stack 2012-11-30, 20:16
On Fri, Nov 30, 2012 at 8:56 AM, Bryan Baugher <[EMAIL PROTECTED]> wrote:
> Unfortunately it does not seem like HTable or HTablePool have any logic to
> tell the HConnectionManager the connection is stale and I don't believe you
> can rely on all of the clients giving back the connection at the same time
> in order to solve this issue.
> So I have a couple questions,
> 1. Since HConnectionImplementation understands if it is being managed or
> not, would it make sense for it to remove itself from the
> HConnectionManager cache when abort(String, Throwable) is called via
> deleteStaleConnection(..)? Notice that the close() method currently does
> something similar.
Sounds right, yes.
> 2. Should HConnectionManager delete connections that are closed/aborted and
> have been passed back to it via the deleteConnection methods?
Also sounds like the right thing to do.
> Although I wish I had a junit that could show this, I also believe that a
> HConnectionImplementation can become aborted during construction. We saw
> this happening while the cluster services were down, HConnectionManager
> would retrieve a new HConnection but it would come to us already
> There are a couple other issues with HTablePool and dealing with this
> issue but these behaviors seem like they would need to be addressed first.
>  - https://issues.apache.org/jira/browse/HBASE-6956
What do you think of what Igor pasted into the issue?