Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: Coprocessors

Sudarshan Kadambi 2013-04-25, 21:57
lars hofhansl 2013-04-25, 22:06
Sudarshan Kadambi 2013-04-25, 21:44
lars hofhansl 2013-04-25, 21:54
Michael Segel 2013-04-25, 22:12
Viral Bajaria 2013-04-25, 22:28
Copy link to this message
Re: Coprocessors
Gary Helmling 2013-04-25, 22:35
> I'm looking to write a service that runs alongside the region servers and
> acts a proxy b/w my application and the region servers.
> I plan to use the logic in HBase client's HConnectionManager, to segment
> my request of 1M rowkeys into sub-requests per region-server. These are
> sent over to the proxy which fetches the data from the region server,
> aggregates locally and sends data back. Does this sound reasonable or even
> a useful thing to pursue?
This is essentially what coprocessor endpoints (called through
HTable.coprocessorExec()) basically do.  (One difference is that there is a
parallel request per-region, not per-region server, though that is a
potential optimization that could be made as well).

The tricky part I see for the case you describe is splitting your full set
of row keys up correctly per region.  You could send the full set of row
keys to each endpoint invocation, and have the endpoint implementation
filter down to only those keys present in the current region.  But that
would be a lot of overhead on the request side.  You could split the row
keys into per-region sets on the client side, but I'm not sure we provide
sufficient context for the Batch.Callable instance you provide to
coprocessorExec() to determine which region it is being invoked against.
James Taylor 2013-04-25, 22:44
Sudarshan Kadambi 2013-04-25, 22:36
Michael Segel 2013-04-26, 02:43
James Taylor 2013-04-25, 23:00
Sudarshan Kadambi 2013-04-25, 23:19
James Taylor 2013-04-25, 23:51
James Taylor 2013-05-02, 00:01