Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> RE: endpoint coprocessor performance


+
Kimdhamilton 2013-03-06, 04:54
+
Kim Hamilton 2013-03-08, 01:02
+
Gary Helmling 2013-03-08, 01:34
+
Andrew Purtell 2013-03-08, 01:35
+
Andrew Purtell 2013-03-08, 01:13
+
Kim Hamilton 2013-03-05, 01:14
Copy link to this message
-
Re: endpoint coprocessor performance
Do you have timing results for an Apache HBase release? Our last release
was 0.94.5.

On Tuesday, March 5, 2013, Kim Hamilton wrote:

> Hi all,
> I've been lurking here for a while, so thanks for all the valuable tips and
> guidance you've given so far.
>
> I'm running some experiments to understand where to use coprocessors. One
> interesting scenario is computing distinct values. I ran performance tests
> with two distinct value implementations: one using endpoint coprocessors,
> and one using just scans (computing distinct values client side only). I
> noticed that the endpoint coprocessor implementation averaged 80 ms slower
> than the scan implementation. Details of that are below for anyone
> interested.
>
> To drill into the performance, I instrumented the code and ultimately
> deployed a no-op endpoint coprocessor, to look at the overhead of simply
> calling it. I'm measuring around 100ms for calling my empty, no-op endpoint
> coprocessor.
>
> I need to do more tests, but I believe my tests are leading me to similar
> conclusions drawn here:
> http://hbase-coprocessor-experiments.blogspot.com/2011/05/extending.html
>
> I.e. if the query/scan is selective enough (I'll go out on a limb and
> estimate 50-100 rows), then it's better to just perform a scan and compute
> client side. Endpoint coprocessors will make sense for larger result sets
> and/or scans that hit multiple regions.
>
> Before going too far, I wanted to check if anyone in this group has
> suggestions. I.e. perhaps there are just some configuration options I've
> not uncovered. Does this 100ms latency sound correct?
>
> Thanks,
> Kim
>
>
> *Detailed results of distinct value comparison, just FYI*
>
> Using 0.92.1-cdh4.1.0
> Scan result size ~50-100
> Row size 1kb, but after filtering for only desired columns, 380 bytes
>
> *with coprocessors*
> AverageLatency(ms), 176.1353
> MinLatency(ms), 42
> MaxLatency(ms), 2368
> 95thPercentileLatency(ms), 321
> 99thPercentileLatency(ms), 422
>
> *scan-only, compute distinct values client side*
> AverageLatency(ms), 92.8165
> MinLatency(ms), 4
> MaxLatency(ms), 986
> 95thPercentileLatency(ms), 253
> 99thPercentileLatency(ms), 356
>
--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
+
Andrew Purtell 2013-03-05, 02:05
+
James Taylor 2013-03-05, 01:58
+
Gary Helmling 2013-03-05, 02:23
+
Gary Helmling 2013-03-05, 02:30
+
Stephen Boesch 2013-03-05, 04:08
+
Kim Hamilton 2013-03-05, 21:13
+
Andrew Purtell 2013-03-06, 01:58
+
Anoop Sam John 2013-03-06, 03:14
+
Gary Helmling 2013-03-05, 01:42