Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> EndPoint Coprocessor could be dealocked?


Copy link to this message
-
Re: EndPoint Coprocessor could be dealocked?
> You should not let just any user run coprocessors on the server. That's madness.
>
> Best regards,
>
>    - Andy

Fei Ding,

I'm a little confused.
Are you trying to solve the problem of querying  data efficiently from a table, or are you trying to find an example of where and when  to use co-processors?

You actually have an interesting problem that isn't easily solved in relational databases, but I don't think its an appropriate problem if you want to stress the use of coprocessors.

Yes with Indexes you want to use coprocessors as a way to keep the index in synch with the underlying table.

However beyond that... the solution is really best run as a M/R job.

Considering that HBase has two different access methods. One is as part of M/R jobs, the other is a client/server model.  If you wanted to, you could create a service/engine/app that would allow you to efficiently query and return result sets from your database, as well as manage indexes.
In part, coprocessors make this a lot easier.

If you consider the general flow of my solution earlier in this thread, you now have a really great way to implement this.

Note: we're really talking about allowing someone to query data from a table using multiple indexes and index types. Think alternate table (key/value pair) , Lucene/SOLR, and GeoSpatial.

You could even bench mark it against an Oracle implementation, and probably smoke it.
You could also do efficient joins between tables.

So yeah, I would encourage you to work on your initial problem... ;-)

Just Saying...  ;-)

-Mike

On May 16, 2012, at 8:49 PM, Andrew Purtell wrote:

> On Wed, May 16, 2012 at 6:43 PM, fding hbase <[EMAIL PROTECTED]> wrote:
>>> Not coprocessors in general. The client side support for Endpoints
>>> (Exec, etc.) gives the developer the fiction of addressing the cluster
>>> as a range of rows, and will parallelize per-region Endpoint
>>> invocations, and collect the responses, and can return them all to the
>>> caller as "a single call".
>>
>> But on the deadlock problem the Endpoint behaves the same way as Observer.
>> Endpoints are also executed via RPC handlers of RegionServer.
>
> Reread what I wrote. I'm not talking about the server side above.
>
> Regarding the RPC issues, yes the behavior is the same. My other point
> was there is no RPC deadlock if you schedule your additional work
> (which issues RPCs) in some background thread or Executor and return
> to the client immediately. But that is not what you have claimed you
> want to do, you want to do some distributed indexed join if I
> understood it correctly *first* (via RPC) and *then* return to the
> client. That is how you would get deadlocks.
>
>> the coprocessors are written by users and any kind of
>> code may appear on the server side.
>
> You should not let just any user run coprocessors on the server. That's madness.
>
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein (via Tom White)
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB