Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Question on Coprocessors and Atomicity


Copy link to this message
-
Re: Question on Coprocessors and Atomicity
I understand - yes, if constraints are limited to local variables and
there is no "read & put" being done, we're fine. I think I
misunderstood the original intent of "constraint" to include read &
put and missed the "intercept and check whether the column values are
of the right data type / format" use case.

I think what you explained sounds very reasonable and we can defer the
check & put constraint to HBASE-4999

Thanks for the detailed clarification.
--Suraj

On Sun, Dec 11, 2011 at 10:26 PM, Jesse Yates <[EMAIL PROTECTED]> wrote:
> Adding constraints was originally developed with the idea of checking
> incoming writes for validity, based on their internal properties. For
> instance, checking to make sure that a value is in the range 1-10, or that
> it is an integer, or not over a certain length (you get the idea).
>
> If the row lock is released between the time the coprocessor finishes
>> "preXXXX" checks and the core mutation method is invoked (as has been
>> discussed in this thread), how can the Constraint be ensured? If two
>> requests are being processed in parallel, there is every possibility
>> that both requests pass the "Constraint" check individually, but break
>> it together (e.g. even simple checks like column value == 10 would
>> break if two requests fire concurrently).
>>
>> So - I'm questioning whether a pure Coprocessor implementation alone
>> would be sufficient?
>>
>
> If you have a situation where are just doing the straight put (or
> checkAndPut) then the constraint will not affect the atomicity or
> consistency of writes to the table. Even if one of the puts finishes the
> constraint first, the single row will not be corrupted and the 'fastest'
> write will win. The underlying region takes care of breaking the tie and
> ensuring that writes get serialized to the table.
>
> Its true that the transactional nature will be an issue if you are doing a
> read on the table first. From HRegion.internalPut(), we can see that we
> first get a readlock and then, when doing the put actually do serialize the
> writes into the memstore, enforcing the consistency and atomicity of the
> update. However, there are no check that the prePut checking is going to
> enforce that atomicity. From internal put, we see:
>
> /* run pre put hook outside of lock to avoid deadlock */
>    if (coprocessorHost != null) {
>      if (coprocessorHost.prePut(put, walEdit, writeToWAL)) {
>        return;
>      }
>    }
>
> So yes, this doesn't ensure that we are going to get specific ordering or
> even a fully consistent view of the underlying data. However, as long as
> each constraint just uses local variables, each thread interacting with the
> constraint will be completely fine.
>
> Yeah, some documentation needs to be done on constraints as far as not
> using static fields or expecting multiple thread interaction. However, the
> checking ensures that the field passes - the underlying store takes care of
> ensuring atomicity and consistency.
>
> Take your example of column value == 10. Two different requests go into the
> HRegion. They each run the Equals10Constraint in a different thread. The
> constraint allows each to pass (lets assume here that they both do ==10).
> Then it will hit the remainder of HRegion.internalPut(), go through the
> updatesLock and then also get serialized in the call to
> HRegion.applyFamilyMapToMemstore(familyMap, null). So the atomicity of the
> writes are preserved. This also works for n concurrent writes and where not
> all writes pass the constraint.
>
>
>
>>
>> I think we'll need an approach that makes the constraint checking and
>> mutation to be _atomically_ achieved
>> a) either by taking a row lock and passing that into put / checkAndPut
>> b) referencing & checking the constraint directly from within the put
>> / checkAndPut methods (like we do with the comparator, for instance)
>>
>>
> However, if we wanted to do a read of the table, within the coprocessor, we
> will want to actually pass in the rowlock so we can ensure that we don't
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB