Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Coprocessors and batch processing

Copy link to this message
Re: Coprocessors and batch processing
Hey Lars,
Sorry if I have mislead you.

The current Coprocessor infrastructure is at _Region_ level, not at
_RegionServer_ level.
All these batch operations are ultimately ends up at some rows in some
Regions, where you have hooked your CPs.

I am not able to follow your example. If you end up with a 200 Puts batch at
a RS, then what? You need to execute this at Region level now, right? A RS
just host Regions, and these Regions are mobile too. I think that is the
reason one should nail down to a specific Region while doing any crud
operations here; and also a reason why current CPs are at Region level.

That's my pov though! :)

On Thu, Aug 11, 2011 at 11:56 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Thanks Himanshu,
> but that is not quite what I meant.
> Yes, a batch operation is broken up in "chunks" per regionserver and then
> the chunks are shipped to the individual regionservers.
> But then there is no way to interact with those chunks at the regionserver
> through coprocessors(as a whole).
> What I want to do is to look at the entire chunk at each regionserver and
> then do some other bulk operation based on that.
> Currently I only get pre/post hooks for single rows, and no way to group
> these together later (other than just waiting for a little bit and let
> work accumulate).
> Say I have a client request with (say) 1000 puts, and let's also say that
> there are 5 region server, and each happens to host exactly 1/5th of the
> rowkeys, so each region server gets a chunk of 200 puts.
> Now a coprocessor might have logic that affect another table (for example
> for naive 2ndary indexing). At the coprocessor level I can get an
> HTableInterface from the environment and now I want to do a batch put of
> 200 rows (of course those will be broken up per region server again, etc).
> Currently I can't do that, because there are only single "row" pre/post
> hooks, and no way to determine when all operations of a request are done.
> The end result is that I have to do 200 single row puts, one in each call to
> pre or post hooks.
> Does that make sense?
> -- Lars
> ________________________________
> From: Himanshu Vashishtha <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Wednesday, August 10, 2011 11:21 PM
> Subject: Re: Coprocessors and batch processing
> Client side batch processing is done at RegionServer level, i.e., all
> Action
> objects are grouped together per RS basis and send in one RPC. Once the
> batch arrives at a RS, it gets distributed across corresponding Regions,
> and
> these Action objects are processed, one by one. This include Coprocessor's
> Exec objects too.
> So, a coprocessor is working at a "Region" level granularity.
> If you want to take some action (process bunch of rows of another table
> from
> a CP), one can get a HTable instance from Environment instance of a
> Coprocessor, and use the same mechanism as used by the client side.
> Will that help in your use-case?
> Thanks,
> Himanshu
> On Wed, Aug 10, 2011 at 11:46 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> > Here's another coprocessor question...
> >
> > From the client we batch operations in order to reduce the number of
> round
> > trips.
> > Currently there is no way (that I can find) to make use of those batches
> in
> > coprocessors.
> >
> > This is an issue when, for example, sets of puts and gets are (partially)
> > forwarded to another table by the coprocessor.
> > Right now this would need to use many single puts/deletes/gets from the
> > various {pre|post}{put|delete|get} hooks.
> >
> > There is no useful demarcation; other than maybe waiting a few
> miliseconds,
> > which is awkward.
> >
> >
> > Of course this forwarding could be done directly from the client, put
> then
> > what's the point of coprocessors?
> >
> > I guess there could either be a {pre|post}Multi on RegionObserver
> (although
> > HRegionServer.multi does a lot of munging).