Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Improving Coprocessor postSplit/postOpen synchronization


Copy link to this message
-
RE: Improving Coprocessor postSplit/postOpen synchronization
Hi Kevin

I am very much interested to see this.  We have done something similar
internally but along with the new coprocessor hooks that we added, we also
tweaked a bit on the kernel side.
It is something like,
Divide your splits steps into two parts

Steps before PONR and steps after PONR.

First do the steps before PONR for the main region.
Then do it for the index region in the presplit hook.

Now the info that you need populate in a thread local and get it in the
kernel side.  Use this info and make a single put entry to the META such
that you can offline the parent region of both the index and the main
region.

Now do the after PONR step for the index and then for the main region.  What
do you think of this approach?  
Now talking about roll back, the roll back has to in the reverse way, that
is rollback index region and then the main region.  But any failure after
PONR just abort the RS irrespective of the main and index region so that the
restart scenarios can handle it as the OFFLINING step was done atomically as
suggested above.

Looking fwd for your patch also.
Regards
Ram

> -----Original Message-----
> From: Kevin Shin [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, August 28, 2012 3:40 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Improving Coprocessor postSplit/postOpen synchronization
>
> Thanks Ted,
>
> As a better approach instead of adding code to pre/postOpen, we're
> going to
> see if we can add one more coprocessor call instead to enforce
> modularity
> between splits and opens. Will submit patch soon.
>
> Best,
> Kevin
>
> On Mon, Aug 27, 2012 at 1:49 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Ramkrishna recently checked in HBASE-6633
> >
> > But that doesn't seem to address your use case.
> >
> > Go ahead and file a JIRA.
> >
> > On Mon, Aug 27, 2012 at 1:29 PM, Kevin Shin <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Hi everyone,
> > >
> > > A colleague and I were working with HBase coprocessors for
> secondary
> > > indexes and ran into an interesting problem regarding splits
> > > and synchronizing the corresponding parent/daughter regions.
> > >
> > > The goal with splits is to create two new daughter regions with the
> > > corresponding splits of the secondary indexes and lock these
> regions such
> > > that Puts/Deletes that occur while postSplit is in progress will be
> > queued
> > > up so we don't run into consistency issues. IE, if a delete gets
> called
> > > before a daughter region receives the split index, that delete
> would
> > > essentially have been ignored, so we would want to wait until
> postSplit
> > is
> > > finished before running any new Puts/Deletes on the split regions.
> > >
> > > As of right now, the HBase coprocessors do not easily support a way
> to
> > > achieve this level of consistency in that there is no way to
> distinguish
> > a
> > > region being opened from a split or a regular open. If we could
> > > distinguish, we could open up the correct index from the start and
> stall
> > > until postSplit is finished in the background in the event of a
> split. I
> > > would thus like to propose a way to "lock" the daughter regions
> when
> > > postSplit is called. That is, when we open a daughter region from a
> > split,
> > > we can pass in the parent region name alongside it (or Null if
> there is
> > no
> > > parent) to distinguish a region being opened from a split or open.
> I am
> > > thinking about submitting a patch into JIRA but would greatly
> appreciate
> > > any thoughts or suggestions for another solution to the problem or
> > perhaps
> > > a better patch. I am using HBase 0.92 for development at this
> moment.
> > >
> > > Best,
> > > Kevin
> > >
> >