Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Improving Coprocessor postSplit/postOpen synchronization


Copy link to this message
-
Re: Improving Coprocessor postSplit/postOpen synchronization
Hi Kevin,

> Just to clarify Andrew do you have a prototype patch up that could
potentially be worked on to either move postSplit() or add new hooks into
the framework/are planning on submitting it sometime in the near future?

No, I meant one of the patches I put up on HBASE-2000. Basically CP design
is multiversioned my head and I skipped over the current version due to a
bug. :-) Sorry for any confusion.

Like Ram says in a subsequent email, we could add a new upcall for the PONR
in the split transaction, preSplitPONR and postSplitPONR.... though the
naming is not ideal perhaps. I opened
https://issues.apache.org/jira/browse/HBASE-6696

On Tue, Aug 28, 2012 at 11:59 PM, Kevin Shin <
[EMAIL PROTECTED]> wrote:

> Hello again everyone,
>
> Thanks for responding! I really appreciate all of the advice that's been
> given so far.  :)
>
> Just to clarify Andrew do you have a prototype patch up that could
> potentially be worked on to either move postSplit() or add new hooks into
> the framework/are planning on submitting it sometime in the near future?
>
> I'd also love to get any feedback from the community about where to add
> the hook(s) but my thought was that we should have different levels of
> hooks within a split as Ramkrishna suggested. Perhaps two preSplits to
> accomodate for grabbing as well as a postSplit and a completeSplit? Giving
> a better abstraction would definitely help developers figure out how to
> deal with asynchronous calls to split, Put, and Delete. Thanks as always!
>
> Best,
> Kevin
>
> On Tue, Aug 28, 2012 at 11:12 AM, lars hofhansl <[EMAIL PROTECTED]>wrote:
>
>> That approach sounds good to me.
>>
>>
>>
>> ----- Original Message -----
>> From: Andrew Purtell <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Cc:
>> Sent: Tuesday, August 28, 2012 3:05 AM
>> Subject: Re: Improving Coprocessor postSplit/postOpen synchronization
>>
>> Never mind, I went to look at the code. Should have done that first.
>>
>> Looking at 0.94 sources, in SplitTransaction, first we notify the master
>> that the split has happened, and wait for the master to process it (which
>> opens daughters), and then call up to the CP with the daughter regions as
>> arguments.
>>
>> I seem to remember that in my prototype patch for the CP framework,
>> postSplit notification let the CP know the split took place and allow it
>> to
>> take actions before the master opened the daughters. In any event that's
>> not the code now, so it seems what you need here is for us to move the
>> postSplit upcall up prior to master notification or add another hook at
>> that location.
>>
>> On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED]
>> >wrote:
>>
>> > (from postSplit)
>> >
>> >
>> > On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED]
>> >wrote:
>> >
>> >> What about writing a marker (a file) into the region at split (from
>> >> preSplit) which is then existence checked and read at open (postOpen)?
>> This
>> >> file would contain whatever indexing metadata is required.
>> >>
>> >> Also, splits are nearly instant because the daughters are created with
>> >> reference files to the parent, until a later compaction brings the data
>> >> from the parent over. Can you do the same with your indexes? Reason I
>> ask
>> >> is this notion of "ignoring" new data until indexes are available seems
>> >> undesirable.
>> >>
>> >>
>> >> On Mon, Aug 27, 2012 at 11:29 PM, Kevin Shin <
>> >> [EMAIL PROTECTED]> wrote:
>> >>
>> >>> Hi everyone,
>> >>>
>> >>> A colleague and I were working with HBase coprocessors for secondary
>> >>> indexes and ran into an interesting problem regarding splits
>> >>> and synchronizing the corresponding parent/daughter regions.
>> >>>
>> >>> The goal with splits is to create two new daughter regions with the
>> >>> corresponding splits of the secondary indexes and lock these regions
>> such
>> >>> that Puts/Deletes that occur while postSplit is in progress will be
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB