Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> EndPoint Coprocessor could be dealocked?


Copy link to this message
-
Re: EndPoint Coprocessor could be dealocked?
Ok...

I think you need to step away from your solution and take a look at the problem from a different perspective.

From my limited understanding of Co-processors, this doesn't fit well in what you want to do.
I don't believe that you want to run a M/R query within a Co-processor.

In short, if I understood your problem, your goal is to pull data efficiently from a table based on using the intersections of 2 or more indexes.
 
Note: Most people create composite indexes but its possible that you want to index data against a column value along with a different type of index... like geo spatial.

So here you need to capture the intersection of the index lists and then use that resulting subset as input in to a m/r job to return the underlying data.  (Note: you can do this in a single child too. )

If you use a M/R job to fetch and process over the result set, you would need to take your intersection in to a java object like an ordered list where you can then split the list and pass this off to each node.
On May 16, 2012, at 1:12 AM, fding hbase wrote:

> Hi Michel,
>
> Thanks for your reply. I believe your idea works both in theory and
> practice. But the problem I worried about does not
> lie on the memory usage, but on the network performance. If I query all the
> indexed rows from index tables and pull all
> of them to client and push them to the temp table, then the
> client network overhead is heavy. If I can move the calculation to
> server side then the result will be reduced a lot after intersection.
>
> But sadly, HBase ipc doesn't allow coprocessor chaining mechanism...
> Someone mentioned on
> http://grokbase.com/t/hbase/user/116hrhhf8m/coprocessor-failure-question-and-examples
> :
>
> If a RegionObserver issues RPC to another table from any of the hooks that
> are called
> out of RPC handlers (for Gets, Puts, Deletes, etc.), you risk deadlock.
> Whatever activity
> you want to check should be in the same region as account data to avoid
> that.
> (Or HBase RPC needs to change.)
>
>
> So, that means, the deadlock is inevitable under current circumstance. The
> coprocessors are still limited.
>
> What I'm seeking is possible extensions of coprocessors or workaround for
> such situations that extra RPC is needed
> in the RPC handlers.
>
> By the way, the idea you described looks like what Apache
> commons-collections CollectionUtils.intersection() does.
>
> On Tue, May 15, 2012 at 8:23 PM, Michel Segel <[EMAIL PROTECTED]>wrote:
>
>> Sorry for the delay... Had a full day yesterday...
>>
>> In a nut shell... Tough nut to crack.  I can give you a solution which you
>> can probably enhance...
>>
>> At the start, ignore coProcessors for now...
>>
>> So what end up doing is the following.
>>
>> General solution... N indexes..
>> Create a temp table in HBase. (1 column foo)
>>
>> Assuming that you have a simple K,V index, so you just need to do a simple
>> get() against the index to get the list of rows ...
>>
>> For each index, fetch the rows.
>> For each row, write the rowid and then auto increment a counter in a
>> column foo.
>>
>> Then scan the table where foo's counter >= N. note that it should == N but
>> just in case...
>>
>> Now you have found multiple indexes.
>>
>> Having said that...
>> Again assuming your indexes are a simple K,V pair where V is a set of row
>> ids...
>>
>> Create a hash map of <rowid, count>
>> For each index:
>>    Get() row based on key
>>     For each rowid in row:
>>          If map.fetch(rowid) is null then add ( rowid, 1)
>>          Else increment the value in count;
>>     ;
>> ;
>> For each rowid in map(rowid, count):
>>   If count == number of indexes N
>>   Then add rowid to result set.
>> ;
>>
>> Now just return the rows where you have it's rowid in the result set.
>>
>> That you can do in a coprocessor...
>>         but you may have a memory issue... Depending on the number of
>> rowid in your index.
>>
>>
>>
>> does that help?
>>
>>
>> Sent from a remote device. Please excuse any typos...