Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Coprocessors and batch processing


+
lars hofhansl 2011-08-11, 05:46
+
Himanshu Vashishtha 2011-08-11, 06:21
+
Xian Woo 2011-08-11, 07:39
+
lars hofhansl 2011-08-11, 17:56
Copy link to this message
-
Re: Coprocessors and batch processing
Hey Lars,
Sorry if I have mislead you.

The current Coprocessor infrastructure is at _Region_ level, not at
_RegionServer_ level.
All these batch operations are ultimately ends up at some rows in some
Regions, where you have hooked your CPs.

I am not able to follow your example. If you end up with a 200 Puts batch at
a RS, then what? You need to execute this at Region level now, right? A RS
just host Regions, and these Regions are mobile too. I think that is the
reason one should nail down to a specific Region while doing any crud
operations here; and also a reason why current CPs are at Region level.

That's my pov though! :)

Himanshu
On Thu, Aug 11, 2011 at 11:56 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Thanks Himanshu,
>
> but that is not quite what I meant.
>
>
> Yes, a batch operation is broken up in "chunks" per regionserver and then
> the chunks are shipped to the individual regionservers.
> But then there is no way to interact with those chunks at the regionserver
> through coprocessors(as a whole).
>
>
> What I want to do is to look at the entire chunk at each regionserver and
> then do some other bulk operation based on that.
> Currently I only get pre/post hooks for single rows, and no way to group
> these together later (other than just waiting for a little bit and let
> work accumulate).
>
> Say I have a client request with (say) 1000 puts, and let's also say that
> there are 5 region server, and each happens to host exactly 1/5th of the
> rowkeys, so each region server gets a chunk of 200 puts.
> Now a coprocessor might have logic that affect another table (for example
> for naive 2ndary indexing). At the coprocessor level I can get an
>
> HTableInterface from the environment and now I want to do a batch put of
> 200 rows (of course those will be broken up per region server again, etc).
> Currently I can't do that, because there are only single "row" pre/post
> hooks, and no way to determine when all operations of a request are done.
> The end result is that I have to do 200 single row puts, one in each call to
> pre or post hooks.
>
>
> Does that make sense?
>
> -- Lars
>
>
>
> ________________________________
> From: Himanshu Vashishtha <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Wednesday, August 10, 2011 11:21 PM
> Subject: Re: Coprocessors and batch processing
>
> Client side batch processing is done at RegionServer level, i.e., all
> Action
> objects are grouped together per RS basis and send in one RPC. Once the
> batch arrives at a RS, it gets distributed across corresponding Regions,
> and
> these Action objects are processed, one by one. This include Coprocessor's
> Exec objects too.
> So, a coprocessor is working at a "Region" level granularity.
>
> If you want to take some action (process bunch of rows of another table
> from
> a CP), one can get a HTable instance from Environment instance of a
> Coprocessor, and use the same mechanism as used by the client side.
> Will that help in your use-case?
>
> Thanks,
> Himanshu
>
>
> On Wed, Aug 10, 2011 at 11:46 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
>
> > Here's another coprocessor question...
> >
> > From the client we batch operations in order to reduce the number of
> round
> > trips.
> > Currently there is no way (that I can find) to make use of those batches
> in
> > coprocessors.
> >
> > This is an issue when, for example, sets of puts and gets are (partially)
> > forwarded to another table by the coprocessor.
> > Right now this would need to use many single puts/deletes/gets from the
> > various {pre|post}{put|delete|get} hooks.
> >
> > There is no useful demarcation; other than maybe waiting a few
> miliseconds,
> > which is awkward.
> >
> >
> > Of course this forwarding could be done directly from the client, put
> then
> > what's the point of coprocessors?
> >
> > I guess there could either be a {pre|post}Multi on RegionObserver
> (although
> > HRegionServer.multi does a lot of munging).
+
Gary Helmling 2011-08-11, 18:24
+
lars hofhansl 2011-08-11, 20:01
+
Gary Helmling 2011-08-16, 06:35
+
lars hofhansl 2011-08-17, 04:05
+
Lars 2011-08-11, 15:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB