Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> EndPoint Coprocessor could be dealocked?

Copy link to this message
Re: EndPoint Coprocessor could be dealocked?
Fei DIng,

I think you're making the solution harder than it should be.

To start with, the only think you need to do is use co-processors to keep the indexes in sync with the underlying table.

The code called from the co-processor will depend on the type of action and the type of index you are using.

Then you need to only focus on how you use the index and then how you implement the intersection of the result sets.

One idea I had was to invert the intersection table so that you would have N rows where each row would contain the result set. Then you fetch one row to get your row keys.
So if you have 3 indexes where you would want to find the intersection, fetch the row key value of 3 would yield the intersection, rather than do a scan of the key values and fetch the intersection count.  (This could work, but you may have issues with very large result sets. (How many columns can you have? )
The point is that if you place your focus first on the problem and then secondly on the mechanics you will have an easier time solving the problem. The only catch is that you have to be able to work in the abstract.



PS. This really is an interesting problem which when solved will help with the evolution of HBase more as a Database than as a persistent object store.

On May 17, 2012, at 7:38 PM, fding hbase wrote:

> Hi Michel,
> On Fri, May 18, 2012 at 1:39 AM, Michael Segel <[EMAIL PROTECTED]>wrote:
>>> You should not let just any user run coprocessors on the server. That's
>> madness.
>>> Best regards,
>>>   - Andy
>> Fei Ding,
>> I'm a little confused.
>> Are you trying to solve the problem of querying  data efficiently from a
>> table, or are you trying to find an example of where and when  to use
>> co-processors?
> I'm trying to solve the problem of querying data efficiently. Coprocessor
> is one of the possible solutions that I've tried.
>> You actually have an interesting problem that isn't easily solved in
>> relational databases, but I don't think its an appropriate problem if you
>> want to stress the use of coprocessors.
>> Yes with Indexes you want to use coprocessors as a way to keep the index
>> in synch with the underlying table.
>> However beyond that... the solution is really best run as a M/R job.
>> Considering that HBase has two different access methods. One is as part of
>> M/R jobs, the other is a client/server model.  If you wanted to, you could
>> create a service/engine/app that would allow you to efficiently query and
>> return result sets from your database, as well as manage indexes.
>> In part, coprocessors make this a lot easier.
> I'm not using the coprocessors to maintain index tables, but using extended
> client to do this.
>> If you consider the general flow of my solution earlier in this thread,
>> you now have a really great way to implement this.
>> Note: we're really talking about allowing someone to query data from a
>> table using multiple indexes and index types. Think alternate table
>> (key/value pair) , Lucene/SOLR, and GeoSpatial.
>> You could even bench mark it against an Oracle implementation, and
>> probably smoke it.
>> You could also do efficient joins between tables.
>> So yeah, I would encourage you to work on your initial problem... ;-)
> Alternate table is also one of the possible solutions, however, it's not
> that easy too.  I'm still working on it. ;-)
> --
> Best Regards!
> Fei Ding