Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Fanning out hbase queries in parallel


Copy link to this message
-
Re: Fanning out hbase queries in parallel
Sonal Goyal 2011-07-25, 04:03
Hi Paul,

Have you taken a look at HBase coprocessors? I think you will find them
useful.

Best Regards,
Sonal
<https://github.com/sonalgoyal/hiho>Hadoop ETL and Data
Integration<https://github.com/sonalgoyal/hiho>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>

On Mon, Jul 25, 2011 at 8:13 AM, Paul Nickerson <[EMAIL PROTECTED]
> wrote:

>
> I would like to implement a multidimensional query system that aggregates
> large amounts of data on-the-fly by fanning out queries in parallel. It
> should be fast enough for interactive exploration of the data and extensible
> enough to take sets of hundreds or thousands of dimensions with high
> cardinality, and aggregate them from high granularity to low granularity.
> Dimensions and their values are stored in the row key. For instance, row
> keys look like this
> Foo=bar,blah=123
> and each row contains numerical values within their column families, such
> as plays=100, versioned by the date of calculation.
> User wants the top "Foo" values with blah=123 sorted downward by total
> plays in july. My current thinking is that a query would get executed by
> grouping all Foo-prefixed row keys by region server, and send the query to
> each of those. Each region server iterates through all of it's row keys that
> start with Foo=something,blah=, and passes the query on to all regions
> containing blahs that equal 123, which then contain play counts. Matching
> row keys, as well as the sum of all their play values within july, are
> passed back up the chain and sorted/truncated when possible.
>
>
> It seems quite complicated and would involve either modifying hbase source
> code or at the very least using the deep internals of the api. Does this
> seem like a practical solution or could someone offer some ideas?
>
>
> Thank you!