Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> endpoint coprocessor performance


Copy link to this message
-
Re: endpoint coprocessor performance
Please disregard. James may have nailed it and that's not version
dependent.

On Tuesday, March 5, 2013, Andrew Purtell wrote:

> Do you have timing results for an Apache HBase release? Our last release
> was 0.94.5.
>
> On Tuesday, March 5, 2013, Kim Hamilton wrote:
>
>> Hi all,
>> I've been lurking here for a while, so thanks for all the valuable tips
>> and
>> guidance you've given so far.
>>
>> I'm running some experiments to understand where to use coprocessors. One
>> interesting scenario is computing distinct values. I ran performance tests
>> with two distinct value implementations: one using endpoint coprocessors,
>> and one using just scans (computing distinct values client side only). I
>> noticed that the endpoint coprocessor implementation averaged 80 ms slower
>> than the scan implementation. Details of that are below for anyone
>> interested.
>>
>> To drill into the performance, I instrumented the code and ultimately
>> deployed a no-op endpoint coprocessor, to look at the overhead of simply
>> calling it. I'm measuring around 100ms for calling my empty, no-op
>> endpoint
>> coprocessor.
>>
>> I need to do more tests, but I believe my tests are leading me to similar
>> conclusions drawn here:
>> http://hbase-coprocessor-experiments.blogspot.com/2011/05/extending.html
>>
>> I.e. if the query/scan is selective enough (I'll go out on a limb and
>> estimate 50-100 rows), then it's better to just perform a scan and compute
>> client side. Endpoint coprocessors will make sense for larger result sets
>> and/or scans that hit multiple regions.
>>
>> Before going too far, I wanted to check if anyone in this group has
>> suggestions. I.e. perhaps there are just some configuration options I've
>> not uncovered. Does this 100ms latency sound correct?
>>
>> Thanks,
>> Kim
>>
>>
>> *Detailed results of distinct value comparison, just FYI*
>>
>> Using 0.92.1-cdh4.1.0
>> Scan result size ~50-100
>> Row size 1kb, but after filtering for only desired columns, 380 bytes
>>
>> *with coprocessors*
>> AverageLatency(ms), 176.1353
>> MinLatency(ms), 42
>> MaxLatency(ms), 2368
>> 95thPercentileLatency(ms), 321
>> 99thPercentileLatency(ms), 422
>>
>> *scan-only, compute distinct values client side*
>> AverageLatency(ms), 92.8165
>> MinLatency(ms), 4
>> MaxLatency(ms), 986
>> 95thPercentileLatency(ms), 253
>> 99thPercentileLatency(ms), 356
>>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>

--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB