|
Kevin Shin
2012-08-27, 20:29
Ted Yu
2012-08-27, 20:49
Kevin Shin
2012-08-27, 22:09
Ramkrishna.S.Vasudevan
2012-08-28, 06:57
Andrew Purtell
2012-08-28, 09:53
Andrew Purtell
2012-08-28, 09:53
Andrew Purtell
2012-08-28, 10:05
lars hofhansl
2012-08-28, 18:12
Kevin Shin
2012-08-28, 20:59
Ramkrishna.S.Vasudevan
2012-08-29, 04:38
Andrew Purtell
2012-08-30, 06:48
|
-
Improving Coprocessor postSplit/postOpen synchronizationKevin Shin 2012-08-27, 20:29
Hi everyone,
A colleague and I were working with HBase coprocessors for secondary indexes and ran into an interesting problem regarding splits and synchronizing the corresponding parent/daughter regions. The goal with splits is to create two new daughter regions with the corresponding splits of the secondary indexes and lock these regions such that Puts/Deletes that occur while postSplit is in progress will be queued up so we don't run into consistency issues. IE, if a delete gets called before a daughter region receives the split index, that delete would essentially have been ignored, so we would want to wait until postSplit is finished before running any new Puts/Deletes on the split regions. As of right now, the HBase coprocessors do not easily support a way to achieve this level of consistency in that there is no way to distinguish a region being opened from a split or a regular open. If we could distinguish, we could open up the correct index from the start and stall until postSplit is finished in the background in the event of a split. I would thus like to propose a way to "lock" the daughter regions when postSplit is called. That is, when we open a daughter region from a split, we can pass in the parent region name alongside it (or Null if there is no parent) to distinguish a region being opened from a split or open. I am thinking about submitting a patch into JIRA but would greatly appreciate any thoughts or suggestions for another solution to the problem or perhaps a better patch. I am using HBase 0.92 for development at this moment. Best, Kevin
-
Re: Improving Coprocessor postSplit/postOpen synchronizationTed Yu 2012-08-27, 20:49
Ramkrishna recently checked in HBASE-6633
But that doesn't seem to address your use case. Go ahead and file a JIRA. On Mon, Aug 27, 2012 at 1:29 PM, Kevin Shin < [EMAIL PROTECTED]> wrote: > Hi everyone, > > A colleague and I were working with HBase coprocessors for secondary > indexes and ran into an interesting problem regarding splits > and synchronizing the corresponding parent/daughter regions. > > The goal with splits is to create two new daughter regions with the > corresponding splits of the secondary indexes and lock these regions such > that Puts/Deletes that occur while postSplit is in progress will be queued > up so we don't run into consistency issues. IE, if a delete gets called > before a daughter region receives the split index, that delete would > essentially have been ignored, so we would want to wait until postSplit is > finished before running any new Puts/Deletes on the split regions. > > As of right now, the HBase coprocessors do not easily support a way to > achieve this level of consistency in that there is no way to distinguish a > region being opened from a split or a regular open. If we could > distinguish, we could open up the correct index from the start and stall > until postSplit is finished in the background in the event of a split. I > would thus like to propose a way to "lock" the daughter regions when > postSplit is called. That is, when we open a daughter region from a split, > we can pass in the parent region name alongside it (or Null if there is no > parent) to distinguish a region being opened from a split or open. I am > thinking about submitting a patch into JIRA but would greatly appreciate > any thoughts or suggestions for another solution to the problem or perhaps > a better patch. I am using HBase 0.92 for development at this moment. > > Best, > Kevin >
-
Re: Improving Coprocessor postSplit/postOpen synchronizationKevin Shin 2012-08-27, 22:09
Thanks Ted,
As a better approach instead of adding code to pre/postOpen, we're going to see if we can add one more coprocessor call instead to enforce modularity between splits and opens. Will submit patch soon. Best, Kevin On Mon, Aug 27, 2012 at 1:49 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Ramkrishna recently checked in HBASE-6633 > > But that doesn't seem to address your use case. > > Go ahead and file a JIRA. > > On Mon, Aug 27, 2012 at 1:29 PM, Kevin Shin < > [EMAIL PROTECTED]> wrote: > > > Hi everyone, > > > > A colleague and I were working with HBase coprocessors for secondary > > indexes and ran into an interesting problem regarding splits > > and synchronizing the corresponding parent/daughter regions. > > > > The goal with splits is to create two new daughter regions with the > > corresponding splits of the secondary indexes and lock these regions such > > that Puts/Deletes that occur while postSplit is in progress will be > queued > > up so we don't run into consistency issues. IE, if a delete gets called > > before a daughter region receives the split index, that delete would > > essentially have been ignored, so we would want to wait until postSplit > is > > finished before running any new Puts/Deletes on the split regions. > > > > As of right now, the HBase coprocessors do not easily support a way to > > achieve this level of consistency in that there is no way to distinguish > a > > region being opened from a split or a regular open. If we could > > distinguish, we could open up the correct index from the start and stall > > until postSplit is finished in the background in the event of a split. I > > would thus like to propose a way to "lock" the daughter regions when > > postSplit is called. That is, when we open a daughter region from a > split, > > we can pass in the parent region name alongside it (or Null if there is > no > > parent) to distinguish a region being opened from a split or open. I am > > thinking about submitting a patch into JIRA but would greatly appreciate > > any thoughts or suggestions for another solution to the problem or > perhaps > > a better patch. I am using HBase 0.92 for development at this moment. > > > > Best, > > Kevin > > >
-
RE: Improving Coprocessor postSplit/postOpen synchronizationRamkrishna.S.Vasudevan 2012-08-28, 06:57
Hi Kevin
I am very much interested to see this. We have done something similar internally but along with the new coprocessor hooks that we added, we also tweaked a bit on the kernel side. It is something like, Divide your splits steps into two parts Steps before PONR and steps after PONR. First do the steps before PONR for the main region. Then do it for the index region in the presplit hook. Now the info that you need populate in a thread local and get it in the kernel side. Use this info and make a single put entry to the META such that you can offline the parent region of both the index and the main region. Now do the after PONR step for the index and then for the main region. What do you think of this approach? Now talking about roll back, the roll back has to in the reverse way, that is rollback index region and then the main region. But any failure after PONR just abort the RS irrespective of the main and index region so that the restart scenarios can handle it as the OFFLINING step was done atomically as suggested above. Looking fwd for your patch also. Regards Ram > -----Original Message----- > From: Kevin Shin [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, August 28, 2012 3:40 AM > To: [EMAIL PROTECTED] > Subject: Re: Improving Coprocessor postSplit/postOpen synchronization > > Thanks Ted, > > As a better approach instead of adding code to pre/postOpen, we're > going to > see if we can add one more coprocessor call instead to enforce > modularity > between splits and opens. Will submit patch soon. > > Best, > Kevin > > On Mon, Aug 27, 2012 at 1:49 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Ramkrishna recently checked in HBASE-6633 > > > > But that doesn't seem to address your use case. > > > > Go ahead and file a JIRA. > > > > On Mon, Aug 27, 2012 at 1:29 PM, Kevin Shin < > > [EMAIL PROTECTED]> wrote: > > > > > Hi everyone, > > > > > > A colleague and I were working with HBase coprocessors for > secondary > > > indexes and ran into an interesting problem regarding splits > > > and synchronizing the corresponding parent/daughter regions. > > > > > > The goal with splits is to create two new daughter regions with the > > > corresponding splits of the secondary indexes and lock these > regions such > > > that Puts/Deletes that occur while postSplit is in progress will be > > queued > > > up so we don't run into consistency issues. IE, if a delete gets > called > > > before a daughter region receives the split index, that delete > would > > > essentially have been ignored, so we would want to wait until > postSplit > > is > > > finished before running any new Puts/Deletes on the split regions. > > > > > > As of right now, the HBase coprocessors do not easily support a way > to > > > achieve this level of consistency in that there is no way to > distinguish > > a > > > region being opened from a split or a regular open. If we could > > > distinguish, we could open up the correct index from the start and > stall > > > until postSplit is finished in the background in the event of a > split. I > > > would thus like to propose a way to "lock" the daughter regions > when > > > postSplit is called. That is, when we open a daughter region from a > > split, > > > we can pass in the parent region name alongside it (or Null if > there is > > no > > > parent) to distinguish a region being opened from a split or open. > I am > > > thinking about submitting a patch into JIRA but would greatly > appreciate > > > any thoughts or suggestions for another solution to the problem or > > perhaps > > > a better patch. I am using HBase 0.92 for development at this > moment. > > > > > > Best, > > > Kevin > > > > >
-
Re: Improving Coprocessor postSplit/postOpen synchronizationAndrew Purtell 2012-08-28, 09:53
What about writing a marker (a file) into the region at split (from
preSplit) which is then existence checked and read at open (postOpen)? This file would contain whatever indexing metadata is required. Also, splits are nearly instant because the daughters are created with reference files to the parent, until a later compaction brings the data from the parent over. Can you do the same with your indexes? Reason I ask is this notion of "ignoring" new data until indexes are available seems undesirable. On Mon, Aug 27, 2012 at 11:29 PM, Kevin Shin < [EMAIL PROTECTED]> wrote: > Hi everyone, > > A colleague and I were working with HBase coprocessors for secondary > indexes and ran into an interesting problem regarding splits > and synchronizing the corresponding parent/daughter regions. > > The goal with splits is to create two new daughter regions with the > corresponding splits of the secondary indexes and lock these regions such > that Puts/Deletes that occur while postSplit is in progress will be queued > up so we don't run into consistency issues. IE, if a delete gets called > before a daughter region receives the split index, that delete would > essentially have been ignored, so we would want to wait until postSplit is > finished before running any new Puts/Deletes on the split regions. > > As of right now, the HBase coprocessors do not easily support a way to > achieve this level of consistency in that there is no way to distinguish a > region being opened from a split or a regular open. If we could > distinguish, we could open up the correct index from the start and stall > until postSplit is finished in the background in the event of a split. I > would thus like to propose a way to "lock" the daughter regions when > postSplit is called. That is, when we open a daughter region from a split, > we can pass in the parent region name alongside it (or Null if there is no > parent) to distinguish a region being opened from a split or open. I am > thinking about submitting a patch into JIRA but would greatly appreciate > any thoughts or suggestions for another solution to the problem or perhaps > a better patch. I am using HBase 0.92 for development at this moment. > > Best, > Kevin > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Improving Coprocessor postSplit/postOpen synchronizationAndrew Purtell 2012-08-28, 09:53
(from postSplit)
On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote: > What about writing a marker (a file) into the region at split (from > preSplit) which is then existence checked and read at open (postOpen)? This > file would contain whatever indexing metadata is required. > > Also, splits are nearly instant because the daughters are created with > reference files to the parent, until a later compaction brings the data > from the parent over. Can you do the same with your indexes? Reason I ask > is this notion of "ignoring" new data until indexes are available seems > undesirable. > > > On Mon, Aug 27, 2012 at 11:29 PM, Kevin Shin < > [EMAIL PROTECTED]> wrote: > >> Hi everyone, >> >> A colleague and I were working with HBase coprocessors for secondary >> indexes and ran into an interesting problem regarding splits >> and synchronizing the corresponding parent/daughter regions. >> >> The goal with splits is to create two new daughter regions with the >> corresponding splits of the secondary indexes and lock these regions such >> that Puts/Deletes that occur while postSplit is in progress will be queued >> up so we don't run into consistency issues. IE, if a delete gets called >> before a daughter region receives the split index, that delete would >> essentially have been ignored, so we would want to wait until postSplit is >> finished before running any new Puts/Deletes on the split regions. >> >> As of right now, the HBase coprocessors do not easily support a way to >> achieve this level of consistency in that there is no way to distinguish a >> region being opened from a split or a regular open. If we could >> distinguish, we could open up the correct index from the start and stall >> until postSplit is finished in the background in the event of a split. I >> would thus like to propose a way to "lock" the daughter regions when >> postSplit is called. That is, when we open a daughter region from a split, >> we can pass in the parent region name alongside it (or Null if there is no >> parent) to distinguish a region being opened from a split or open. I am >> thinking about submitting a patch into JIRA but would greatly appreciate >> any thoughts or suggestions for another solution to the problem or perhaps >> a better patch. I am using HBase 0.92 for development at this moment. >> >> Best, >> Kevin >> > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Improving Coprocessor postSplit/postOpen synchronizationAndrew Purtell 2012-08-28, 10:05
Never mind, I went to look at the code. Should have done that first.
Looking at 0.94 sources, in SplitTransaction, first we notify the master that the split has happened, and wait for the master to process it (which opens daughters), and then call up to the CP with the daughter regions as arguments. I seem to remember that in my prototype patch for the CP framework, postSplit notification let the CP know the split took place and allow it to take actions before the master opened the daughters. In any event that's not the code now, so it seems what you need here is for us to move the postSplit upcall up prior to master notification or add another hook at that location. On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote: > (from postSplit) > > > On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote: > >> What about writing a marker (a file) into the region at split (from >> preSplit) which is then existence checked and read at open (postOpen)? This >> file would contain whatever indexing metadata is required. >> >> Also, splits are nearly instant because the daughters are created with >> reference files to the parent, until a later compaction brings the data >> from the parent over. Can you do the same with your indexes? Reason I ask >> is this notion of "ignoring" new data until indexes are available seems >> undesirable. >> >> >> On Mon, Aug 27, 2012 at 11:29 PM, Kevin Shin < >> [EMAIL PROTECTED]> wrote: >> >>> Hi everyone, >>> >>> A colleague and I were working with HBase coprocessors for secondary >>> indexes and ran into an interesting problem regarding splits >>> and synchronizing the corresponding parent/daughter regions. >>> >>> The goal with splits is to create two new daughter regions with the >>> corresponding splits of the secondary indexes and lock these regions such >>> that Puts/Deletes that occur while postSplit is in progress will be >>> queued >>> up so we don't run into consistency issues. IE, if a delete gets called >>> before a daughter region receives the split index, that delete would >>> essentially have been ignored, so we would want to wait until postSplit >>> is >>> finished before running any new Puts/Deletes on the split regions. >>> >>> As of right now, the HBase coprocessors do not easily support a way to >>> achieve this level of consistency in that there is no way to distinguish >>> a >>> region being opened from a split or a regular open. If we could >>> distinguish, we could open up the correct index from the start and stall >>> until postSplit is finished in the background in the event of a split. I >>> would thus like to propose a way to "lock" the daughter regions when >>> postSplit is called. That is, when we open a daughter region from a >>> split, >>> we can pass in the parent region name alongside it (or Null if there is >>> no >>> parent) to distinguish a region being opened from a split or open. I am >>> thinking about submitting a patch into JIRA but would greatly appreciate >>> any thoughts or suggestions for another solution to the problem or >>> perhaps >>> a better patch. I am using HBase 0.92 for development at this moment. >>> >>> Best, >>> Kevin >>> >> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Improving Coprocessor postSplit/postOpen synchronizationlars hofhansl 2012-08-28, 18:12
That approach sounds good to me.
----- Original Message ----- From: Andrew Purtell <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Tuesday, August 28, 2012 3:05 AM Subject: Re: Improving Coprocessor postSplit/postOpen synchronization Never mind, I went to look at the code. Should have done that first. Looking at 0.94 sources, in SplitTransaction, first we notify the master that the split has happened, and wait for the master to process it (which opens daughters), and then call up to the CP with the daughter regions as arguments. I seem to remember that in my prototype patch for the CP framework, postSplit notification let the CP know the split took place and allow it to take actions before the master opened the daughters. In any event that's not the code now, so it seems what you need here is for us to move the postSplit upcall up prior to master notification or add another hook at that location. On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote: > (from postSplit) > > > On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote: > >> What about writing a marker (a file) into the region at split (from >> preSplit) which is then existence checked and read at open (postOpen)? This >> file would contain whatever indexing metadata is required. >> >> Also, splits are nearly instant because the daughters are created with >> reference files to the parent, until a later compaction brings the data >> from the parent over. Can you do the same with your indexes? Reason I ask >> is this notion of "ignoring" new data until indexes are available seems >> undesirable. >> >> >> On Mon, Aug 27, 2012 at 11:29 PM, Kevin Shin < >> [EMAIL PROTECTED]> wrote: >> >>> Hi everyone, >>> >>> A colleague and I were working with HBase coprocessors for secondary >>> indexes and ran into an interesting problem regarding splits >>> and synchronizing the corresponding parent/daughter regions. >>> >>> The goal with splits is to create two new daughter regions with the >>> corresponding splits of the secondary indexes and lock these regions such >>> that Puts/Deletes that occur while postSplit is in progress will be >>> queued >>> up so we don't run into consistency issues. IE, if a delete gets called >>> before a daughter region receives the split index, that delete would >>> essentially have been ignored, so we would want to wait until postSplit >>> is >>> finished before running any new Puts/Deletes on the split regions. >>> >>> As of right now, the HBase coprocessors do not easily support a way to >>> achieve this level of consistency in that there is no way to distinguish >>> a >>> region being opened from a split or a regular open. If we could >>> distinguish, we could open up the correct index from the start and stall >>> until postSplit is finished in the background in the event of a split. I >>> would thus like to propose a way to "lock" the daughter regions when >>> postSplit is called. That is, when we open a daughter region from a >>> split, >>> we can pass in the parent region name alongside it (or Null if there is >>> no >>> parent) to distinguish a region being opened from a split or open. I am >>> thinking about submitting a patch into JIRA but would greatly appreciate >>> any thoughts or suggestions for another solution to the problem or >>> perhaps >>> a better patch. I am using HBase 0.92 for development at this moment. >>> >>> Best, >>> Kevin >>> >> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Improving Coprocessor postSplit/postOpen synchronizationKevin Shin 2012-08-28, 20:59
Hello again everyone,
Thanks for responding! I really appreciate all of the advice that's been given so far. :) Just to clarify Andrew do you have a prototype patch up that could potentially be worked on to either move postSplit() or add new hooks into the framework/are planning on submitting it sometime in the near future? I'd also love to get any feedback from the community about where to add the hook(s) but my thought was that we should have different levels of hooks within a split as Ramkrishna suggested. Perhaps two preSplits to accomodate for grabbing as well as a postSplit and a completeSplit? Giving a better abstraction would definitely help developers figure out how to deal with asynchronous calls to split, Put, and Delete. Thanks as always! Best, Kevin On Tue, Aug 28, 2012 at 11:12 AM, lars hofhansl <[EMAIL PROTECTED]> wrote: > That approach sounds good to me. > > > > ----- Original Message ----- > From: Andrew Purtell <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: > Sent: Tuesday, August 28, 2012 3:05 AM > Subject: Re: Improving Coprocessor postSplit/postOpen synchronization > > Never mind, I went to look at the code. Should have done that first. > > Looking at 0.94 sources, in SplitTransaction, first we notify the master > that the split has happened, and wait for the master to process it (which > opens daughters), and then call up to the CP with the daughter regions as > arguments. > > I seem to remember that in my prototype patch for the CP framework, > postSplit notification let the CP know the split took place and allow it to > take actions before the master opened the daughters. In any event that's > not the code now, so it seems what you need here is for us to move the > postSplit upcall up prior to master notification or add another hook at > that location. > > On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED] > >wrote: > > > (from postSplit) > > > > > > On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED] > >wrote: > > > >> What about writing a marker (a file) into the region at split (from > >> preSplit) which is then existence checked and read at open (postOpen)? > This > >> file would contain whatever indexing metadata is required. > >> > >> Also, splits are nearly instant because the daughters are created with > >> reference files to the parent, until a later compaction brings the data > >> from the parent over. Can you do the same with your indexes? Reason I > ask > >> is this notion of "ignoring" new data until indexes are available seems > >> undesirable. > >> > >> > >> On Mon, Aug 27, 2012 at 11:29 PM, Kevin Shin < > >> [EMAIL PROTECTED]> wrote: > >> > >>> Hi everyone, > >>> > >>> A colleague and I were working with HBase coprocessors for secondary > >>> indexes and ran into an interesting problem regarding splits > >>> and synchronizing the corresponding parent/daughter regions. > >>> > >>> The goal with splits is to create two new daughter regions with the > >>> corresponding splits of the secondary indexes and lock these regions > such > >>> that Puts/Deletes that occur while postSplit is in progress will be > >>> queued > >>> up so we don't run into consistency issues. IE, if a delete gets called > >>> before a daughter region receives the split index, that delete would > >>> essentially have been ignored, so we would want to wait until postSplit > >>> is > >>> finished before running any new Puts/Deletes on the split regions. > >>> > >>> As of right now, the HBase coprocessors do not easily support a way to > >>> achieve this level of consistency in that there is no way to > distinguish > >>> a > >>> region being opened from a split or a regular open. If we could > >>> distinguish, we could open up the correct index from the start and > stall > >>> until postSplit is finished in the background in the event of a split. > I > >>> would thus like to propose a way to "lock" the daughter regions when > >>> postSplit is called. That is, when we open a daughter region from a
-
RE: Improving Coprocessor postSplit/postOpen synchronizationRamkrishna.S.Vasudevan 2012-08-29, 04:38
Hi
As per Andrew the postSplit needs to move up before even the transitioning of the ZK nodes are done. But in case of splits it is like if the META is updated with the updated daughter info, the clients will tend to send in the data to those updated regions. So it is better we can add a new hook postSplitBeforeDaughterOpeningDaughterRegions() (the name may be not very correct, we can change it) just after the PONR step is completed. This will help the code in the CP to know what action to take for the region the CP hooks are handling. So I deprecated the postSplit() hook when I gave a patch I think we can bring it back considering Andy's point and the current usecase in discussion. Regards Ram > -----Original Message----- > From: Kevin Shin [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, August 29, 2012 2:30 AM > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: Re: Improving Coprocessor postSplit/postOpen synchronization > > Hello again everyone, > > Thanks for responding! I really appreciate all of the advice that's > been > given so far. :) > > Just to clarify Andrew do you have a prototype patch up that could > potentially be worked on to either move postSplit() or add new hooks > into > the framework/are planning on submitting it sometime in the near > future? > > I'd also love to get any feedback from the community about where to add > the > hook(s) but my thought was that we should have different levels of > hooks > within a split as Ramkrishna suggested. Perhaps two preSplits to > accomodate > for grabbing as well as a postSplit and a completeSplit? Giving a > better > abstraction would definitely help developers figure out how to deal > with > asynchronous calls to split, Put, and Delete. Thanks as always! > > Best, > Kevin > > On Tue, Aug 28, 2012 at 11:12 AM, lars hofhansl <[EMAIL PROTECTED]> > wrote: > > > That approach sounds good to me. > > > > > > > > ----- Original Message ----- > > From: Andrew Purtell <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED] > > Cc: > > Sent: Tuesday, August 28, 2012 3:05 AM > > Subject: Re: Improving Coprocessor postSplit/postOpen synchronization > > > > Never mind, I went to look at the code. Should have done that first. > > > > Looking at 0.94 sources, in SplitTransaction, first we notify the > master > > that the split has happened, and wait for the master to process it > (which > > opens daughters), and then call up to the CP with the daughter > regions as > > arguments. > > > > I seem to remember that in my prototype patch for the CP framework, > > postSplit notification let the CP know the split took place and allow > it to > > take actions before the master opened the daughters. In any event > that's > > not the code now, so it seems what you need here is for us to move > the > > postSplit upcall up prior to master notification or add another hook > at > > that location. > > > > On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED] > > >wrote: > > > > > (from postSplit) > > > > > > > > > On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell > <[EMAIL PROTECTED] > > >wrote: > > > > > >> What about writing a marker (a file) into the region at split > (from > > >> preSplit) which is then existence checked and read at open > (postOpen)? > > This > > >> file would contain whatever indexing metadata is required. > > >> > > >> Also, splits are nearly instant because the daughters are created > with > > >> reference files to the parent, until a later compaction brings the > data > > >> from the parent over. Can you do the same with your indexes? > Reason I > > ask > > >> is this notion of "ignoring" new data until indexes are available > seems > > >> undesirable. > > >> > > >> > > >> On Mon, Aug 27, 2012 at 11:29 PM, Kevin Shin < > > >> [EMAIL PROTECTED]> wrote: > > >> > > >>> Hi everyone, > > >>> > > >>> A colleague and I were working with HBase coprocessors for > secondary > > >>> indexes and ran into an interesting problem regarding splits
-
Re: Improving Coprocessor postSplit/postOpen synchronizationAndrew Purtell 2012-08-30, 06:48
Hi Kevin,
> Just to clarify Andrew do you have a prototype patch up that could potentially be worked on to either move postSplit() or add new hooks into the framework/are planning on submitting it sometime in the near future? No, I meant one of the patches I put up on HBASE-2000. Basically CP design is multiversioned my head and I skipped over the current version due to a bug. :-) Sorry for any confusion. Like Ram says in a subsequent email, we could add a new upcall for the PONR in the split transaction, preSplitPONR and postSplitPONR.... though the naming is not ideal perhaps. I opened https://issues.apache.org/jira/browse/HBASE-6696 On Tue, Aug 28, 2012 at 11:59 PM, Kevin Shin < [EMAIL PROTECTED]> wrote: > Hello again everyone, > > Thanks for responding! I really appreciate all of the advice that's been > given so far. :) > > Just to clarify Andrew do you have a prototype patch up that could > potentially be worked on to either move postSplit() or add new hooks into > the framework/are planning on submitting it sometime in the near future? > > I'd also love to get any feedback from the community about where to add > the hook(s) but my thought was that we should have different levels of > hooks within a split as Ramkrishna suggested. Perhaps two preSplits to > accomodate for grabbing as well as a postSplit and a completeSplit? Giving > a better abstraction would definitely help developers figure out how to > deal with asynchronous calls to split, Put, and Delete. Thanks as always! > > Best, > Kevin > > On Tue, Aug 28, 2012 at 11:12 AM, lars hofhansl <[EMAIL PROTECTED]>wrote: > >> That approach sounds good to me. >> >> >> >> ----- Original Message ----- >> From: Andrew Purtell <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Cc: >> Sent: Tuesday, August 28, 2012 3:05 AM >> Subject: Re: Improving Coprocessor postSplit/postOpen synchronization >> >> Never mind, I went to look at the code. Should have done that first. >> >> Looking at 0.94 sources, in SplitTransaction, first we notify the master >> that the split has happened, and wait for the master to process it (which >> opens daughters), and then call up to the CP with the daughter regions as >> arguments. >> >> I seem to remember that in my prototype patch for the CP framework, >> postSplit notification let the CP know the split took place and allow it >> to >> take actions before the master opened the daughters. In any event that's >> not the code now, so it seems what you need here is for us to move the >> postSplit upcall up prior to master notification or add another hook at >> that location. >> >> On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED] >> >wrote: >> >> > (from postSplit) >> > >> > >> > On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED] >> >wrote: >> > >> >> What about writing a marker (a file) into the region at split (from >> >> preSplit) which is then existence checked and read at open (postOpen)? >> This >> >> file would contain whatever indexing metadata is required. >> >> >> >> Also, splits are nearly instant because the daughters are created with >> >> reference files to the parent, until a later compaction brings the data >> >> from the parent over. Can you do the same with your indexes? Reason I >> ask >> >> is this notion of "ignoring" new data until indexes are available seems >> >> undesirable. >> >> >> >> >> >> On Mon, Aug 27, 2012 at 11:29 PM, Kevin Shin < >> >> [EMAIL PROTECTED]> wrote: >> >> >> >>> Hi everyone, >> >>> >> >>> A colleague and I were working with HBase coprocessors for secondary >> >>> indexes and ran into an interesting problem regarding splits >> >>> and synchronizing the corresponding parent/daughter regions. >> >>> >> >>> The goal with splits is to create two new daughter regions with the >> >>> corresponding splits of the secondary indexes and lock these regions >> such >> >>> that Puts/Deletes that occur while postSplit is in progress will be > |