Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - assignment - is master beeing a watchdog useful?


Copy link to this message
-
Re: assignment - is master beeing a watchdog useful?
Jimmy Xiang 2012-12-06, 18:35
We can't make the assignment znode ephemeral.  It is used to track
region assignments, and recovery.  For example, if a region is moving
from rs A to rs B, while it is opening on
B and B and the master die.  If the znode is gone with B, then the new
backup master will think the region is still open on rs A since A is
live and meta still shows the region is on A, which is not the case.

Thanks,
Jimmy

On Thu, Dec 6, 2012 at 10:18 AM, Sergey Shelukhin
<[EMAIL PROTECTED]> wrote:
> I may be missing some past context here, but why not make it so that the
> assignment zookeeper node is ephemeral, so it dies with the server?
> Then it will be possible to notice there's no more assignment without the
> separate watcher.
>
> I have conflicting opinions about the current safeguard; on one hand, I've
> seen at least one bug (HBASE-6060) that was fixed (on 0.96 but explicitly
> not in 0.94) that resulted in region never being assigned (until the 30min
> watcher kicked in, that is).
> On the other hand, making catch-alls for code bugs in this manner seems
> like a bad practice.
> Maybe we can remove it when we have "bulletproof" unit(!) tests for AM that
> take into account various scenarios.
>
> On Thu, Dec 6, 2012 at 9:26 AM, Jimmy Xiang <[EMAIL PROTECTED]> wrote:
>
>> Currently, rs doesn't watch the znode.  RS cancels ongoing open after
>> master tells it so.
>>
>> Jimmy
>>
>> On Wed, Dec 5, 2012 at 7:53 PM, Stack <[EMAIL PROTECTED]> wrote:
>> > On Wed, Dec 5, 2012 at 6:57 PM, Jimmy Xiang <[EMAIL PROTECTED]> wrote:
>> >
>> >> If this region server happens to be hot, it may take a while to open
>> >> it.  If we don't time it out, the server may be even hotter.  If the
>> >> region server could not open it here, other region servers may not be
>> >> able to open it either.
>> >>
>> >
>> >
>> > I suppose the master can still 'timeout' the open if the RS is watching
>> the
>> > znode for the region it is trying to open.  The RS will notice that
>> master
>> > has assumed control in a callback and can then cancel any ongoing open.
>> >
>> > St.Ack
>>