Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: HBase aggregate query


+
Doug Meil 2012-09-10, 15:21
+
James Taylor 2012-09-11, 00:49
+
lars hofhansl 2012-09-11, 17:59
+
Jerry Lam 2012-09-11, 18:48
Copy link to this message
-
Re: HBase aggregate query
No, there's no sorted dimension. This would be a full table scan over
40M rows. This assumes the following:
1) your regions are evenly distributed across a four node cluster
2) unique combinations of month * scene are small enough to fit into memory
3) you chunk it up on the client side and run the chunks in parallel
(and have a final merge phase on the client)
On 09/11/2012 10:59 AM, lars hofhansl wrote:
> That's when you aggregate along a sorted dimension (prefix of the key), though. Right?
> Not sure how smart Hive is here, but if it needs to sort the data it will probably be slower than SQL Server for such a small data set.
>
>
>
> ----- Original Message -----
> From: James Taylor<[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc:
> Sent: Monday, September 10, 2012 5:49 PM
> Subject: Re: HBase aggregate query
>
> iwannaplay games<funnlearnforkids@...>  writes:
>> Hi ,
>>
>> I want to run query like
>>
>> select month(eventdate),scene,count(1),sum(timespent) from eventlog
>> group by month(eventdate),scene
>>
>> in hbase.Through hive its taking a lot of time for 40 million
>> records.Do we have any syntax in hbase to find its result?In sql
>> server it takes around 9 minutes,How long it might take in hbase??
>>
>> Regards
>> Prabhjot
>>
>>
> Hi,
> In our internal testing using server-side coprocessors for aggregation, we've
> found HBase can process these types of queries very quickly: ~10-12 seconds
> using a four node cluster. You need to chunk up and parallelize the work on the
> client side to get this kind of performance, though.
> Regards,
>
> James
>
+
iwannaplay games 2012-09-10, 14:22
+
Srinivas Mupparapu 2012-09-10, 14:16
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB