Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Fanning out hbase queries in parallel


Copy link to this message
-
Re: Fanning out hbase queries in parallel
Which release(s) have coprocessors enabled?

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jul 24, 2011, at 11:03 PM, Sonal Goyal <[EMAIL PROTECTED]> wrote:

> Hi Paul,
>
> Have you taken a look at HBase coprocessors? I think you will find them
> useful.
>
> Best Regards,
> Sonal
> <https://github.com/sonalgoyal/hiho>Hadoop ETL and Data
> Integration<https://github.com/sonalgoyal/hiho>
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
> On Mon, Jul 25, 2011 at 8:13 AM, Paul Nickerson <[EMAIL PROTECTED]
>> wrote:
>
>>
>> I would like to implement a multidimensional query system that aggregates
>> large amounts of data on-the-fly by fanning out queries in parallel. It
>> should be fast enough for interactive exploration of the data and extensible
>> enough to take sets of hundreds or thousands of dimensions with high
>> cardinality, and aggregate them from high granularity to low granularity.
>> Dimensions and their values are stored in the row key. For instance, row
>> keys look like this
>> Foo=bar,blah=123
>> and each row contains numerical values within their column families, such
>> as plays=100, versioned by the date of calculation.
>> User wants the top "Foo" values with blah=123 sorted downward by total
>> plays in july. My current thinking is that a query would get executed by
>> grouping all Foo-prefixed row keys by region server, and send the query to
>> each of those. Each region server iterates through all of it's row keys that
>> start with Foo=something,blah=, and passes the query on to all regions
>> containing blahs that equal 123, which then contain play counts. Matching
>> row keys, as well as the sum of all their play values within july, are
>> passed back up the chain and sorted/truncated when possible.
>>
>>
>> It seems quite complicated and would involve either modifying hbase source
>> code or at the very least using the deep internals of the api. Does this
>> seem like a practical solution or could someone offer some ideas?
>>
>>
>> Thank you!