Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Coprocessors and batch processing

Copy link to this message
Re: Coprocessors and batch processing
Hi Lars,

Should then all RPC triggered by a coprocessor be avoided (and hence the use
> the env-provided HTableInterface be generally discouraged)?
I would generally avoid making synchronous RPC calls within a direct
coprocessor call path (blocking a handler thread waiting on the response).
Making the same calls asynchronously (say queuing puts for a secondary index
to be handled by a background thread) is generally better because you're not
tying up a constrained resource that other clients will be contending on.

The cp environment provided HTableInterface is there to be used, but it's
still up to you to use it wisely.  But providing that resource as part of
the coprocessor framework (instead of using a standard client HTable) will
potentially allow us to do other optimizations like using a separate
priority for coprocessor originating RPCs (handled by a different thread
pool than client RPCs), or short circuiting the RPC stack for calls to
regions residing on the same region server (and avoiding tying up another
handler thread).

Those are just a couple examples, but I think ultimately we'll want a bit
more constraint over what coprocessor code is allowed to do, for the sake of
better guaranteeing cluster stability.  Currently it's a void-your-warranty
type scenario. :)

I still think a RegionServer or RpcServer level "preRequest" and
> "postRequest" (or whatever) hooks would be useful for a variety of
> scenarios.
I could see that, but it's definitely not part of the RegionObserver
contract.  I do also worry that a profusion of too many different
coprocessor interfaces will lead to confusion about how to actually go about
implementing a given application's needs.  But we're still pretty early on
in the development of coprocessors and I don't pretend we have everything

Do you have some specific scenarios where you think a preRequest/postRequest
interface would be better suited?  Feel free to open up a JIRA where we can
walk through them!  We could try to model out a RPC listener or RPC filter
interface.  I think that interacting at the RPC layer (in front of all of
the core HBase code) will be a bit limited in what you have access to.  Many
of the current coprocessor hooks are situated in keys points after a lot of
setup or initialization has gone on to provide the call context.  But for a
given set of needs that may not be an issue (or may even be an advantage).