Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Coprocessors and batch processing


Copy link to this message
-
Re: Coprocessors and batch processing
Himanshu Vashishtha 2011-08-11, 06:21
Client side batch processing is done at RegionServer level, i.e., all Action
objects are grouped together per RS basis and send in one RPC. Once the
batch arrives at a RS, it gets distributed across corresponding Regions, and
these Action objects are processed, one by one. This include Coprocessor's
Exec objects too.
So, a coprocessor is working at a "Region" level granularity.

If you want to take some action (process bunch of rows of another table from
a CP), one can get a HTable instance from Environment instance of a
Coprocessor, and use the same mechanism as used by the client side.
Will that help in your use-case?

Thanks,
Himanshu
On Wed, Aug 10, 2011 at 11:46 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Here's another coprocessor question...
>
> From the client we batch operations in order to reduce the number of round
> trips.
> Currently there is no way (that I can find) to make use of those batches in
> coprocessors.
>
> This is an issue when, for example, sets of puts and gets are (partially)
> forwarded to another table by the coprocessor.
> Right now this would need to use many single puts/deletes/gets from the
> various {pre|post}{put|delete|get} hooks.
>
> There is no useful demarcation; other than maybe waiting a few miliseconds,
> which is awkward.
>
>
> Of course this forwarding could be done directly from the client, put then
> what's the point of coprocessors?
>
> I guess there could either be a {pre|post}Multi on RegionObserver (although
> HRegionServer.multi does a lot of munging).
> Or maybe a general {pre|post}Request with no arguments - in which case it
> would be at least possible to write code in the coprocessor
> to collect the puts/deletes/etc through the normal single
> prePut/preDelete/etc hooks and then batch-process them in postRequest().
>
> -- Lars
>