Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - EndPoint Coprocessor could be dealocked?


Copy link to this message
-
Re: EndPoint Coprocessor could be dealocked?
Michael Segel 2012-05-17, 17:39
> You should not let just any user run coprocessors on the server. That's madness.
>
> Best regards,
>
>    - Andy

Fei Ding,

I'm a little confused.
Are you trying to solve the problem of querying  data efficiently from a table, or are you trying to find an example of where and when  to use co-processors?

You actually have an interesting problem that isn't easily solved in relational databases, but I don't think its an appropriate problem if you want to stress the use of coprocessors.

Yes with Indexes you want to use coprocessors as a way to keep the index in synch with the underlying table.

However beyond that... the solution is really best run as a M/R job.

Considering that HBase has two different access methods. One is as part of M/R jobs, the other is a client/server model.  If you wanted to, you could create a service/engine/app that would allow you to efficiently query and return result sets from your database, as well as manage indexes.
In part, coprocessors make this a lot easier.

If you consider the general flow of my solution earlier in this thread, you now have a really great way to implement this.

Note: we're really talking about allowing someone to query data from a table using multiple indexes and index types. Think alternate table (key/value pair) , Lucene/SOLR, and GeoSpatial.

You could even bench mark it against an Oracle implementation, and probably smoke it.
You could also do efficient joins between tables.

So yeah, I would encourage you to work on your initial problem... ;-)

Just Saying...  ;-)

-Mike

On May 16, 2012, at 8:49 PM, Andrew Purtell wrote:

> On Wed, May 16, 2012 at 6:43 PM, fding hbase <[EMAIL PROTECTED]> wrote:
>>> Not coprocessors in general. The client side support for Endpoints
>>> (Exec, etc.) gives the developer the fiction of addressing the cluster
>>> as a range of rows, and will parallelize per-region Endpoint
>>> invocations, and collect the responses, and can return them all to the
>>> caller as "a single call".
>>
>> But on the deadlock problem the Endpoint behaves the same way as Observer.
>> Endpoints are also executed via RPC handlers of RegionServer.
>
> Reread what I wrote. I'm not talking about the server side above.
>
> Regarding the RPC issues, yes the behavior is the same. My other point
> was there is no RPC deadlock if you schedule your additional work
> (which issues RPCs) in some background thread or Executor and return
> to the client immediately. But that is not what you have claimed you
> want to do, you want to do some distributed indexed join if I
> understood it correctly *first* (via RPC) and *then* return to the
> client. That is how you would get deadlocks.
>
>> the coprocessors are written by users and any kind of
>> code may appear on the server side.
>
> You should not let just any user run coprocessors on the server. That's madness.
>
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein (via Tom White)
>