Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> EndPoint Coprocessor could be dealocked?


+
fding hbase 2012-05-14, 09:35
+
Michel Segel 2012-05-14, 12:17
+
fding hbase 2012-05-14, 13:20
Copy link to this message
-
Re: EndPoint Coprocessor could be dealocked?
Sorry for the delay... Had a full day yesterday...

In a nut shell... Tough nut to crack.  I can give you a solution which you can probably enhance...

At the start, ignore coProcessors for now...

So what end up doing is the following.

General solution... N indexes..
Create a temp table in HBase. (1 column foo)

Assuming that you have a simple K,V index, so you just need to do a simple get() against the index to get the list of rows ...

For each index, fetch the rows.
For each row, write the rowid and then auto increment a counter in a column foo.

Then scan the table where foo's counter >= N. note that it should == N but just in case...

Now you have found multiple indexes.

Having said that...
Again assuming your indexes are a simple K,V pair where V is a set of row ids...

Create a hash map of <rowid, count>
For each index:
     Get() row based on key
      For each rowid in row:
           If map.fetch(rowid) is null then add ( rowid, 1)
           Else increment the value in count;
      ;
;
For each rowid in map(rowid, count):
    If count == number of indexes N
    Then add rowid to result set.
;

Now just return the rows where you have it's rowid in the result set.

That you can do in a coprocessor...
          but you may have a memory issue... Depending on the number of rowid in your index.

does that help?
Sent from a remote device. Please excuse any typos...

Mike Segel

On May 14, 2012, at 8:20 AM, fding hbase <[EMAIL PROTECTED]> wrote:

> Hi Michel,
>
> I indexed each column within a column family of a table, so we can query a
> row with specific column value.
> By multi-index I mean using multiple indexes at the same time on a single
> query. That looks like a SQL select
> with two *where* clauses of two indexed columns.
>
> The row key of index table is made up of column value and row key of
> indexed table. For set intersection
> I used the utility class from Apache common-collections package
> CollectionUtils.intersection(). There's no
> assumption on sort order on indices. A scan with column value as startKey
> and column value+1 as endKey
> applied to index table will return all rows in indexed table with that
> column value.
>
> For multi-index queries, previously I tried to use a scan for each index
> column and intersect of those
> result sets to get the rows that I want. But the query time is too long. So
> I decided to move the computation of
> intersection to server side and reduce the amount of data transferred.
>
> Do you have any better idea?
>
> On Mon, May 14, 2012 at 8:17 PM, Michel Segel <[EMAIL PROTECTED]>wrote:
>
>> Need a little clarification...
>>
>> You said that you need to do multi-index queries.
>>
>> Did you mean to say multiple people running queries at the same time, or
>> did you mean you wanted to do multi-key indexes where the key is a
>> multi-key part.
>>
>> Or did you mean that you really wanted to use multiple indexes at the same
>> time on a single query?
>>
>> If its the latter, not really a good idea...
>> How do you handle the intersection of the two sets? (3 sets or more?)
>> Can you assume that the indexes are in sort order?
>>
>> What happens when the results from the indexes exceed the amount of
>> allocated memory?
>>
>> What I am suggesting you to do is to set aside the underpinnings of HBase
>> and look at the problem you are trying to solve in general terms.  Not an
>> easy one...
>>
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On May 14, 2012, at 4:35 AM, fding hbase <[EMAIL PROTECTED]> wrote:
>>
>>> Hi all,
>>>
>>> Is it possible to use table scanner (different from the host table
>> region)
>>> or
>>> execute coprocessor of another table, in the endpoint coprocessor?
>>> It looks like chaining coprocessors. But I found a possible deadlock!
>>> Can anyone help me with this?
>>>
>>> In my testing environment I deployed the 0.92.0 version from CDH.
>>> I wrote an Endpoint coprocessor to do composite secondary index queries.
+
fding hbase 2012-05-16, 06:12
+
Michael Segel 2012-05-16, 18:03
+
Andrew Purtell 2012-05-16, 18:17
+
Michael Segel 2012-05-16, 19:07
+
Dave Revell 2012-05-16, 21:40
+
Andrew Purtell 2012-05-16, 22:28
+
fding hbase 2012-05-17, 01:43
+
Andrew Purtell 2012-05-17, 01:49
+
Michael Segel 2012-05-17, 17:39
+
fding hbase 2012-05-18, 00:38
+
Michael Segel 2012-05-18, 10:40
+
Michael Segel 2012-05-16, 22:16
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB