Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> EndPoint Coprocessor could be dealocked?


Copy link to this message
-
Re: EndPoint Coprocessor could be dealocked?
Ok...

I think you need to step away from your solution and take a look at the problem from a different perspective.

From my limited understanding of Co-processors, this doesn't fit well in what you want to do.
I don't believe that you want to run a M/R query within a Co-processor.

In short, if I understood your problem, your goal is to pull data efficiently from a table based on using the intersections of 2 or more indexes.
 
Note: Most people create composite indexes but its possible that you want to index data against a column value along with a different type of index... like geo spatial.

So here you need to capture the intersection of the index lists and then use that resulting subset as input in to a m/r job to return the underlying data.  (Note: you can do this in a single child too. )

If you use a M/R job to fetch and process over the result set, you would need to take your intersection in to a java object like an ordered list where you can then split the list and pass this off to each node.
On May 16, 2012, at 1:12 AM, fding hbase wrote:

> Hi Michel,
>
> Thanks for your reply. I believe your idea works both in theory and
> practice. But the problem I worried about does not
> lie on the memory usage, but on the network performance. If I query all the
> indexed rows from index tables and pull all
> of them to client and push them to the temp table, then the
> client network overhead is heavy. If I can move the calculation to
> server side then the result will be reduced a lot after intersection.
>
> But sadly, HBase ipc doesn't allow coprocessor chaining mechanism...
> Someone mentioned on
> http://grokbase.com/t/hbase/user/116hrhhf8m/coprocessor-failure-question-and-examples
> :
>
> If a RegionObserver issues RPC to another table from any of the hooks that
> are called
> out of RPC handlers (for Gets, Puts, Deletes, etc.), you risk deadlock.
> Whatever activity
> you want to check should be in the same region as account data to avoid
> that.
> (Or HBase RPC needs to change.)
>
>
> So, that means, the deadlock is inevitable under current circumstance. The
> coprocessors are still limited.
>
> What I'm seeking is possible extensions of coprocessors or workaround for
> such situations that extra RPC is needed
> in the RPC handlers.
>
> By the way, the idea you described looks like what Apache
> commons-collections CollectionUtils.intersection() does.
>
> On Tue, May 15, 2012 at 8:23 PM, Michel Segel <[EMAIL PROTECTED]>wrote:
>
>> Sorry for the delay... Had a full day yesterday...
>>
>> In a nut shell... Tough nut to crack.  I can give you a solution which you
>> can probably enhance...
>>
>> At the start, ignore coProcessors for now...
>>
>> So what end up doing is the following.
>>
>> General solution... N indexes..
>> Create a temp table in HBase. (1 column foo)
>>
>> Assuming that you have a simple K,V index, so you just need to do a simple
>> get() against the index to get the list of rows ...
>>
>> For each index, fetch the rows.
>> For each row, write the rowid and then auto increment a counter in a
>> column foo.
>>
>> Then scan the table where foo's counter >= N. note that it should == N but
>> just in case...
>>
>> Now you have found multiple indexes.
>>
>> Having said that...
>> Again assuming your indexes are a simple K,V pair where V is a set of row
>> ids...
>>
>> Create a hash map of <rowid, count>
>> For each index:
>>    Get() row based on key
>>     For each rowid in row:
>>          If map.fetch(rowid) is null then add ( rowid, 1)
>>          Else increment the value in count;
>>     ;
>> ;
>> For each rowid in map(rowid, count):
>>   If count == number of indexes N
>>   Then add rowid to result set.
>> ;
>>
>> Now just return the rows where you have it's rowid in the result set.
>>
>> That you can do in a coprocessor...
>>         but you may have a memory issue... Depending on the number of
>> rowid in your index.
>>
>>
>>
>> does that help?
>>
>>
>> Sent from a remote device. Please excuse any typos...
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB