|
|
+
Pere Ferrera 2012-05-03, 12:08
+
Tom Brown 2012-05-03, 15:01
-
Re: aggregation performanceJames Taylor 2012-05-03, 17:02
We're seen reasonable performance, with the caveat that you need to
parallelize the scan doing the aggregation. In our benchmarking, we have the client scan each region in parallel and have a coprocessor aggregate the row count and return a single row back (with the client then totaling the counts it gets back). Here are the numbers we've seen when aggregating 1 million rows, this with a slightly older hbase version (~0.92): Schema: 50col x 50bytes with compressible data Regions RowCount RowCount with single binary filter Time (sec) Time (sec) 1 11.3 19.0 4 3.5 5.6 16 1.8 2.6 32 1.2 1.8 Schema: 1col x 2500bytes with compressible data Regions RowCount RowCount with single binary filter Time (sec) Time (sec) 1 7.0 7.0 4 1.2 1.2 16 0.7 0.7 32 0.3 0.3 This is run on a four machine cluster with each machine having 4G Heap and with the servers warmed-up (cached data). Hope this helps. James On 05/03/2012 08:01 AM, Tom Brown wrote: > For our solution we are doing some aggregation on the server via > coprocessors. In general, for each row there are 8 columns: 7 columns > that contain numbers (for summation) and 1 column that contains a > hyperloglog counter (about 700bytes). Functionally, this solution > works well and ought to scale with the number of region servers. > However, the individual request performance leaves a little to be > desired. What we've seen is that to scan 40000 rows (aggregated into > 3000 rows) takes about 4 seconds. > > Our code is in it's early stages (unoptimized) so we hope to see some > significant performance improvements when we run our coprocessor under > a profiler. Our benchmarks were on underpowered machines (only 2gb > RAM) as well. > > Hope this helps! > > --Tom > > On Thu, May 3, 2012 at 6:08 AM, Pere Ferrera<[EMAIL PROTECTED]> wrote: >> Hi, >> >> Is anybody benchmarking the performance of server-side aggregations through >> co-processors in HBase? I am interested to know if HBase could potentially >> be used to calculate real-time SQL-like aggregations at a good level of >> performance (q< 200ms on high-load, big dataset scenario). Just curious to >> know before I implement my own benchmarks. >> >> Pere. +
Himanshu Vashishtha 2012-05-03, 17:08
|