-Re: HBase aggregate query
James Taylor 2012-09-13, 19:13
No, there's no sorted dimension. This would be a full table scan over
40M rows. This assumes the following:
1) your regions are evenly distributed across a four node cluster
2) unique combinations of month * scene are small enough to fit into memory
3) you chunk it up on the client side and run the chunks in parallel
(and have a final merge phase on the client)
On 09/11/2012 10:59 AM, lars hofhansl wrote:
> That's when you aggregate along a sorted dimension (prefix of the key), though. Right?
> Not sure how smart Hive is here, but if it needs to sort the data it will probably be slower than SQL Server for such a small data set.
> ----- Original Message -----
> From: James Taylor<[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Monday, September 10, 2012 5:49 PM
> Subject: Re: HBase aggregate query
> iwannaplay games<funnlearnforkids@...> writes:
>> Hi ,
>> I want to run query like
>> select month(eventdate),scene,count(1),sum(timespent) from eventlog
>> group by month(eventdate),scene
>> in hbase.Through hive its taking a lot of time for 40 million
>> records.Do we have any syntax in hbase to find its result?In sql
>> server it takes around 9 minutes,How long it might take in hbase??
> In our internal testing using server-side coprocessors for aggregation, we've
> found HBase can process these types of queries very quickly: ~10-12 seconds
> using a four node cluster. You need to chunk up and parallelize the work on the
> client side to get this kind of performance, though.