Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> EndPoint Coprocessor could be dealocked?

Copy link to this message
Re: EndPoint Coprocessor could be dealocked?
Sorry for the delay... Had a full day yesterday...

In a nut shell... Tough nut to crack.  I can give you a solution which you can probably enhance...

At the start, ignore coProcessors for now...

So what end up doing is the following.

General solution... N indexes..
Create a temp table in HBase. (1 column foo)

Assuming that you have a simple K,V index, so you just need to do a simple get() against the index to get the list of rows ...

For each index, fetch the rows.
For each row, write the rowid and then auto increment a counter in a column foo.

Then scan the table where foo's counter >= N. note that it should == N but just in case...

Now you have found multiple indexes.

Having said that...
Again assuming your indexes are a simple K,V pair where V is a set of row ids...

Create a hash map of <rowid, count>
For each index:
     Get() row based on key
      For each rowid in row:
           If map.fetch(rowid) is null then add ( rowid, 1)
           Else increment the value in count;
For each rowid in map(rowid, count):
    If count == number of indexes N
    Then add rowid to result set.

Now just return the rows where you have it's rowid in the result set.

That you can do in a coprocessor...
          but you may have a memory issue... Depending on the number of rowid in your index.

does that help?
Sent from a remote device. Please excuse any typos...

Mike Segel

On May 14, 2012, at 8:20 AM, fding hbase <[EMAIL PROTECTED]> wrote:

> Hi Michel,
> I indexed each column within a column family of a table, so we can query a
> row with specific column value.
> By multi-index I mean using multiple indexes at the same time on a single
> query. That looks like a SQL select
> with two *where* clauses of two indexed columns.
> The row key of index table is made up of column value and row key of
> indexed table. For set intersection
> I used the utility class from Apache common-collections package
> CollectionUtils.intersection(). There's no
> assumption on sort order on indices. A scan with column value as startKey
> and column value+1 as endKey
> applied to index table will return all rows in indexed table with that
> column value.
> For multi-index queries, previously I tried to use a scan for each index
> column and intersect of those
> result sets to get the rows that I want. But the query time is too long. So
> I decided to move the computation of
> intersection to server side and reduce the amount of data transferred.
> Do you have any better idea?
> On Mon, May 14, 2012 at 8:17 PM, Michel Segel <[EMAIL PROTECTED]>wrote:
>> Need a little clarification...
>> You said that you need to do multi-index queries.
>> Did you mean to say multiple people running queries at the same time, or
>> did you mean you wanted to do multi-key indexes where the key is a
>> multi-key part.
>> Or did you mean that you really wanted to use multiple indexes at the same
>> time on a single query?
>> If its the latter, not really a good idea...
>> How do you handle the intersection of the two sets? (3 sets or more?)
>> Can you assume that the indexes are in sort order?
>> What happens when the results from the indexes exceed the amount of
>> allocated memory?
>> What I am suggesting you to do is to set aside the underpinnings of HBase
>> and look at the problem you are trying to solve in general terms.  Not an
>> easy one...
>> Sent from a remote device. Please excuse any typos...
>> Mike Segel
>> On May 14, 2012, at 4:35 AM, fding hbase <[EMAIL PROTECTED]> wrote:
>>> Hi all,
>>> Is it possible to use table scanner (different from the host table
>> region)
>>> or
>>> execute coprocessor of another table, in the endpoint coprocessor?
>>> It looks like chaining coprocessors. But I found a possible deadlock!
>>> Can anyone help me with this?
>>> In my testing environment I deployed the 0.92.0 version from CDH.
>>> I wrote an Endpoint coprocessor to do composite secondary index queries.