Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Improving Coprocessor postSplit/postOpen synchronization


Copy link to this message
-
Re: Improving Coprocessor postSplit/postOpen synchronization
Never mind, I went to look at the code. Should have done that first.

Looking at 0.94 sources, in SplitTransaction, first we notify the master
that the split has happened, and wait for the master to process it (which
opens daughters), and then call up to the CP with the daughter regions as
arguments.

I seem to remember that in my prototype patch for the CP framework,
postSplit notification let the CP know the split took place and allow it to
take actions before the master opened the daughters. In any event that's
not the code now, so it seems what you need here is for us to move the
postSplit upcall up prior to master notification or add another hook at
that location.

On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote:

> (from postSplit)
>
>
> On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote:
>
>> What about writing a marker (a file) into the region at split (from
>> preSplit) which is then existence checked and read at open (postOpen)? This
>> file would contain whatever indexing metadata is required.
>>
>> Also, splits are nearly instant because the daughters are created with
>> reference files to the parent, until a later compaction brings the data
>> from the parent over. Can you do the same with your indexes? Reason I ask
>> is this notion of "ignoring" new data until indexes are available seems
>> undesirable.
>>
>>
>> On Mon, Aug 27, 2012 at 11:29 PM, Kevin Shin <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hi everyone,
>>>
>>> A colleague and I were working with HBase coprocessors for secondary
>>> indexes and ran into an interesting problem regarding splits
>>> and synchronizing the corresponding parent/daughter regions.
>>>
>>> The goal with splits is to create two new daughter regions with the
>>> corresponding splits of the secondary indexes and lock these regions such
>>> that Puts/Deletes that occur while postSplit is in progress will be
>>> queued
>>> up so we don't run into consistency issues. IE, if a delete gets called
>>> before a daughter region receives the split index, that delete would
>>> essentially have been ignored, so we would want to wait until postSplit
>>> is
>>> finished before running any new Puts/Deletes on the split regions.
>>>
>>> As of right now, the HBase coprocessors do not easily support a way to
>>> achieve this level of consistency in that there is no way to distinguish
>>> a
>>> region being opened from a split or a regular open. If we could
>>> distinguish, we could open up the correct index from the start and stall
>>> until postSplit is finished in the background in the event of a split. I
>>> would thus like to propose a way to "lock" the daughter regions when
>>> postSplit is called. That is, when we open a daughter region from a
>>> split,
>>> we can pass in the parent region name alongside it (or Null if there is
>>> no
>>> parent) to distinguish a region being opened from a split or open. I am
>>> thinking about submitting a patch into JIRA but would greatly appreciate
>>> any thoughts or suggestions for another solution to the problem or
>>> perhaps
>>> a better patch. I am using HBase 0.92 for development at this moment.
>>>
>>> Best,
>>> Kevin
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>>    - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
>>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>
--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB