Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Improving Coprocessor postSplit/postOpen synchronization


Copy link to this message
-
Re: Improving Coprocessor postSplit/postOpen synchronization
Hi Kevin,

> Just to clarify Andrew do you have a prototype patch up that could
potentially be worked on to either move postSplit() or add new hooks into
the framework/are planning on submitting it sometime in the near future?

No, I meant one of the patches I put up on HBASE-2000. Basically CP design
is multiversioned my head and I skipped over the current version due to a
bug. :-) Sorry for any confusion.

Like Ram says in a subsequent email, we could add a new upcall for the PONR
in the split transaction, preSplitPONR and postSplitPONR.... though the
naming is not ideal perhaps. I opened
https://issues.apache.org/jira/browse/HBASE-6696

On Tue, Aug 28, 2012 at 11:59 PM, Kevin Shin <
[EMAIL PROTECTED]> wrote:

> Hello again everyone,
>
> Thanks for responding! I really appreciate all of the advice that's been
> given so far.  :)
>
> Just to clarify Andrew do you have a prototype patch up that could
> potentially be worked on to either move postSplit() or add new hooks into
> the framework/are planning on submitting it sometime in the near future?
>
> I'd also love to get any feedback from the community about where to add
> the hook(s) but my thought was that we should have different levels of
> hooks within a split as Ramkrishna suggested. Perhaps two preSplits to
> accomodate for grabbing as well as a postSplit and a completeSplit? Giving
> a better abstraction would definitely help developers figure out how to
> deal with asynchronous calls to split, Put, and Delete. Thanks as always!
>
> Best,
> Kevin
>
> On Tue, Aug 28, 2012 at 11:12 AM, lars hofhansl <[EMAIL PROTECTED]>wrote:
>
>> That approach sounds good to me.
>>
>>
>>
>> ----- Original Message -----
>> From: Andrew Purtell <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Cc:
>> Sent: Tuesday, August 28, 2012 3:05 AM
>> Subject: Re: Improving Coprocessor postSplit/postOpen synchronization
>>
>> Never mind, I went to look at the code. Should have done that first.
>>
>> Looking at 0.94 sources, in SplitTransaction, first we notify the master
>> that the split has happened, and wait for the master to process it (which
>> opens daughters), and then call up to the CP with the daughter regions as
>> arguments.
>>
>> I seem to remember that in my prototype patch for the CP framework,
>> postSplit notification let the CP know the split took place and allow it
>> to
>> take actions before the master opened the daughters. In any event that's
>> not the code now, so it seems what you need here is for us to move the
>> postSplit upcall up prior to master notification or add another hook at
>> that location.
>>
>> On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED]
>> >wrote:
>>
>> > (from postSplit)
>> >
>> >
>> > On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED]
>> >wrote:
>> >
>> >> What about writing a marker (a file) into the region at split (from
>> >> preSplit) which is then existence checked and read at open (postOpen)?
>> This
>> >> file would contain whatever indexing metadata is required.
>> >>
>> >> Also, splits are nearly instant because the daughters are created with
>> >> reference files to the parent, until a later compaction brings the data
>> >> from the parent over. Can you do the same with your indexes? Reason I
>> ask
>> >> is this notion of "ignoring" new data until indexes are available seems
>> >> undesirable.
>> >>
>> >>
>> >> On Mon, Aug 27, 2012 at 11:29 PM, Kevin Shin <
>> >> [EMAIL PROTECTED]> wrote:
>> >>
>> >>> Hi everyone,
>> >>>
>> >>> A colleague and I were working with HBase coprocessors for secondary
>> >>> indexes and ran into an interesting problem regarding splits
>> >>> and synchronizing the corresponding parent/daughter regions.
>> >>>
>> >>> The goal with splits is to create two new daughter regions with the
>> >>> corresponding splits of the secondary indexes and lock these regions
>> such
>> >>> that Puts/Deletes that occur while postSplit is in progress will be
>