Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Question on Coprocessors and Atomicity

Copy link to this message
Re: Question on Coprocessors and Atomicity
I understand - yes, if constraints are limited to local variables and
there is no "read & put" being done, we're fine. I think I
misunderstood the original intent of "constraint" to include read &
put and missed the "intercept and check whether the column values are
of the right data type / format" use case.

I think what you explained sounds very reasonable and we can defer the
check & put constraint to HBASE-4999

Thanks for the detailed clarification.

On Sun, Dec 11, 2011 at 10:26 PM, Jesse Yates <[EMAIL PROTECTED]> wrote:
> Adding constraints was originally developed with the idea of checking
> incoming writes for validity, based on their internal properties. For
> instance, checking to make sure that a value is in the range 1-10, or that
> it is an integer, or not over a certain length (you get the idea).
> If the row lock is released between the time the coprocessor finishes
>> "preXXXX" checks and the core mutation method is invoked (as has been
>> discussed in this thread), how can the Constraint be ensured? If two
>> requests are being processed in parallel, there is every possibility
>> that both requests pass the "Constraint" check individually, but break
>> it together (e.g. even simple checks like column value == 10 would
>> break if two requests fire concurrently).
>> So - I'm questioning whether a pure Coprocessor implementation alone
>> would be sufficient?
> If you have a situation where are just doing the straight put (or
> checkAndPut) then the constraint will not affect the atomicity or
> consistency of writes to the table. Even if one of the puts finishes the
> constraint first, the single row will not be corrupted and the 'fastest'
> write will win. The underlying region takes care of breaking the tie and
> ensuring that writes get serialized to the table.
> Its true that the transactional nature will be an issue if you are doing a
> read on the table first. From HRegion.internalPut(), we can see that we
> first get a readlock and then, when doing the put actually do serialize the
> writes into the memstore, enforcing the consistency and atomicity of the
> update. However, there are no check that the prePut checking is going to
> enforce that atomicity. From internal put, we see:
> /* run pre put hook outside of lock to avoid deadlock */
>    if (coprocessorHost != null) {
>      if (coprocessorHost.prePut(put, walEdit, writeToWAL)) {
>        return;
>      }
>    }
> So yes, this doesn't ensure that we are going to get specific ordering or
> even a fully consistent view of the underlying data. However, as long as
> each constraint just uses local variables, each thread interacting with the
> constraint will be completely fine.
> Yeah, some documentation needs to be done on constraints as far as not
> using static fields or expecting multiple thread interaction. However, the
> checking ensures that the field passes - the underlying store takes care of
> ensuring atomicity and consistency.
> Take your example of column value == 10. Two different requests go into the
> HRegion. They each run the Equals10Constraint in a different thread. The
> constraint allows each to pass (lets assume here that they both do ==10).
> Then it will hit the remainder of HRegion.internalPut(), go through the
> updatesLock and then also get serialized in the call to
> HRegion.applyFamilyMapToMemstore(familyMap, null). So the atomicity of the
> writes are preserved. This also works for n concurrent writes and where not
> all writes pass the constraint.
>> I think we'll need an approach that makes the constraint checking and
>> mutation to be _atomically_ achieved
>> a) either by taking a row lock and passing that into put / checkAndPut
>> b) referencing & checking the constraint directly from within the put
>> / checkAndPut methods (like we do with the comparator, for instance)
> However, if we wanted to do a read of the table, within the coprocessor, we
> will want to actually pass in the rowlock so we can ensure that we don't