Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> RE: endpoint coprocessor performance


+
Kimdhamilton 2013-03-06, 04:54
+
Kim Hamilton 2013-03-08, 01:02
+
Gary Helmling 2013-03-08, 01:34
+
Andrew Purtell 2013-03-08, 01:35
+
Andrew Purtell 2013-03-08, 01:13
+
Kim Hamilton 2013-03-05, 01:14
+
Andrew Purtell 2013-03-05, 01:43
+
Andrew Purtell 2013-03-05, 02:05
Copy link to this message
-
Re: endpoint coprocessor performance
Check your logs for whether your end-point coprocessor is hitting
zookeeper on every invocation to figure out the region start key.
Unfortunately (at least last time I checked), the default way of
invoking an end point coprocessor doesn't use the meta cache. You can go
through a combination of the following instead:
     HRegionLocation regionLocation = retried ?
         connection.relocateRegion(tableName, tableKey) :
         connection.locateRegion(tableName, tableKey);
     ...
Then call HConnection.processExecs call, passing in the regionKeys from
above.
You can trap the error case of the region being relocated and try again
with retried = true and it'll update the meta data cache when
relocateRegion is called.

Once we made this change for Phoenix, our latencies went way down.

HTH,

     James

On 03/04/2013 05:43 PM, Andrew Purtell wrote:
> Do you have timing results for an Apache HBase release? Our last release
> was 0.94.5.
>
> On Tuesday, March 5, 2013, Kim Hamilton wrote:
>
>> Hi all,
>> I've been lurking here for a while, so thanks for all the valuable tips and
>> guidance you've given so far.
>>
>> I'm running some experiments to understand where to use coprocessors. One
>> interesting scenario is computing distinct values. I ran performance tests
>> with two distinct value implementations: one using endpoint coprocessors,
>> and one using just scans (computing distinct values client side only). I
>> noticed that the endpoint coprocessor implementation averaged 80 ms slower
>> than the scan implementation. Details of that are below for anyone
>> interested.
>>
>> To drill into the performance, I instrumented the code and ultimately
>> deployed a no-op endpoint coprocessor, to look at the overhead of simply
>> calling it. I'm measuring around 100ms for calling my empty, no-op endpoint
>> coprocessor.
>>
>> I need to do more tests, but I believe my tests are leading me to similar
>> conclusions drawn here:
>> http://hbase-coprocessor-experiments.blogspot.com/2011/05/extending.html
>>
>> I.e. if the query/scan is selective enough (I'll go out on a limb and
>> estimate 50-100 rows), then it's better to just perform a scan and compute
>> client side. Endpoint coprocessors will make sense for larger result sets
>> and/or scans that hit multiple regions.
>>
>> Before going too far, I wanted to check if anyone in this group has
>> suggestions. I.e. perhaps there are just some configuration options I've
>> not uncovered. Does this 100ms latency sound correct?
>>
>> Thanks,
>> Kim
>>
>>
>> *Detailed results of distinct value comparison, just FYI*
>>
>> Using 0.92.1-cdh4.1.0
>> Scan result size ~50-100
>> Row size 1kb, but after filtering for only desired columns, 380 bytes
>>
>> *with coprocessors*
>> AverageLatency(ms), 176.1353
>> MinLatency(ms), 42
>> MaxLatency(ms), 2368
>> 95thPercentileLatency(ms), 321
>> 99thPercentileLatency(ms), 422
>>
>> *scan-only, compute distinct values client side*
>> AverageLatency(ms), 92.8165
>> MinLatency(ms), 4
>> MaxLatency(ms), 986
>> 95thPercentileLatency(ms), 253
>> 99thPercentileLatency(ms), 356
>>
>
+
Gary Helmling 2013-03-05, 02:23
+
Gary Helmling 2013-03-05, 02:30
+
Stephen Boesch 2013-03-05, 04:08
+
Kim Hamilton 2013-03-05, 21:13
+
Andrew Purtell 2013-03-06, 01:58
+
Anoop Sam John 2013-03-06, 03:14
+
Gary Helmling 2013-03-05, 01:42
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB