Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Rule of thumb: Size of data to send per RPC in a scan


+
David Koch 2013-01-25, 23:59
Copy link to this message
-
Re: Rule of thumb: Size of data to send per RPC in a scan
Looks like HBASE-2214 'Do HBASE-1996 -- setting size to return in scan
rather than count of rows -- properly' may help you.
But that is only in 0.96

Lars H presented some performance numbers in:
  HBASE-7008 Set scanner caching to a better default, disable Nagles
where default for "hbase.client.scanner.caching" changed to 100

Cheers

On Fri, Jan 25, 2013 at 3:59 PM, David Koch <[EMAIL PROTECTED]> wrote:

> Hello,
>
> Is there a rule to determine the best batch/caching combination for
> maximizing scan performance as a function of KV size and (average) number
> of columns per row key?
>
> I have 0.5kb per value (constant), an average of 10 values per row key -
> heavy tailed so some outliers have 100k KVs, around 100million rows in the
> table. The cluster consists of 30 region servers, 24gb of RAM each, nodes
> are connecting with a 1gbit connection. I am running Map/Reduce jobs on the
> table, also with 30 task trackers.
>
> I tried:
> cache: 1, no batching -> 14min
> cache 1000, batch 50 -> 11min
> cache 5000, batch 25 -> crash (timeouts)
> cache 2000, batch 25 -> 15min
>
> Job time can vary quite significantly according to whatever activity
> (compactions?) are going on in the background. Also, I cannot probe for the
> best combination indefinitely since there actual production jobs queued. I
> did expect a larger speed-up with respect to no caching/batching at all -
> is this unjustified?
>
> In short, I am looking for some tips for making scans in a Map/Reduce
> context faster :-)
>
> Thank you,
>
> /David
>
+
David Koch 2013-01-27, 22:25
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB