We can't make the assignment znode ephemeral. It is used to track
region assignments, and recovery. For example, if a region is moving
from rs A to rs B, while it is opening on
B and B and the master die. If the znode is gone with B, then the new
backup master will think the region is still open on rs A since A is
live and meta still shows the region is on A, which is not the case.
On Thu, Dec 6, 2012 at 10:18 AM, Sergey Shelukhin
<[EMAIL PROTECTED]> wrote:
> I may be missing some past context here, but why not make it so that the
> assignment zookeeper node is ephemeral, so it dies with the server?
> Then it will be possible to notice there's no more assignment without the
> separate watcher.
> I have conflicting opinions about the current safeguard; on one hand, I've
> seen at least one bug (HBASE-6060) that was fixed (on 0.96 but explicitly
> not in 0.94) that resulted in region never being assigned (until the 30min
> watcher kicked in, that is).
> On the other hand, making catch-alls for code bugs in this manner seems
> like a bad practice.
> Maybe we can remove it when we have "bulletproof" unit(!) tests for AM that
> take into account various scenarios.
> On Thu, Dec 6, 2012 at 9:26 AM, Jimmy Xiang <[EMAIL PROTECTED]> wrote:
>> Currently, rs doesn't watch the znode. RS cancels ongoing open after
>> master tells it so.
>> On Wed, Dec 5, 2012 at 7:53 PM, Stack <[EMAIL PROTECTED]> wrote:
>> > On Wed, Dec 5, 2012 at 6:57 PM, Jimmy Xiang <[EMAIL PROTECTED]> wrote:
>> >> If this region server happens to be hot, it may take a while to open
>> >> it. If we don't time it out, the server may be even hotter. If the
>> >> region server could not open it here, other region servers may not be
>> >> able to open it either.
>> > I suppose the master can still 'timeout' the open if the RS is watching
>> > znode for the region it is trying to open. The RS will notice that
>> > has assumed control in a callback and can then cancel any ongoing open.
>> > St.Ack