Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Improving Coprocessor postSplit/postOpen synchronization


Copy link to this message
-
Re: Improving Coprocessor postSplit/postOpen synchronization
(from postSplit)

On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote:

> What about writing a marker (a file) into the region at split (from
> preSplit) which is then existence checked and read at open (postOpen)? This
> file would contain whatever indexing metadata is required.
>
> Also, splits are nearly instant because the daughters are created with
> reference files to the parent, until a later compaction brings the data
> from the parent over. Can you do the same with your indexes? Reason I ask
> is this notion of "ignoring" new data until indexes are available seems
> undesirable.
>
>
> On Mon, Aug 27, 2012 at 11:29 PM, Kevin Shin <
> [EMAIL PROTECTED]> wrote:
>
>> Hi everyone,
>>
>> A colleague and I were working with HBase coprocessors for secondary
>> indexes and ran into an interesting problem regarding splits
>> and synchronizing the corresponding parent/daughter regions.
>>
>> The goal with splits is to create two new daughter regions with the
>> corresponding splits of the secondary indexes and lock these regions such
>> that Puts/Deletes that occur while postSplit is in progress will be queued
>> up so we don't run into consistency issues. IE, if a delete gets called
>> before a daughter region receives the split index, that delete would
>> essentially have been ignored, so we would want to wait until postSplit is
>> finished before running any new Puts/Deletes on the split regions.
>>
>> As of right now, the HBase coprocessors do not easily support a way to
>> achieve this level of consistency in that there is no way to distinguish a
>> region being opened from a split or a regular open. If we could
>> distinguish, we could open up the correct index from the start and stall
>> until postSplit is finished in the background in the event of a split. I
>> would thus like to propose a way to "lock" the daughter regions when
>> postSplit is called. That is, when we open a daughter region from a split,
>> we can pass in the parent region name alongside it (or Null if there is no
>> parent) to distinguish a region being opened from a split or open. I am
>> thinking about submitting a patch into JIRA but would greatly appreciate
>> any thoughts or suggestions for another solution to the problem or perhaps
>> a better patch. I am using HBase 0.92 for development at this moment.
>>
>> Best,
>> Kevin
>>
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>
--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)