Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Client Get vs Coprocessor scan performance


+
Kiru Pakkirisamy 2013-08-09, 01:43
+
Ted Yu 2013-08-09, 03:40
+
Kiru Pakkirisamy 2013-08-09, 05:21
+
Wukang Lin 2013-08-09, 06:00
+
Kiru Pakkirisamy 2013-08-09, 07:05
+
Ted Yu 2013-08-09, 05:44
+
Asaf Mesika 2013-08-17, 21:21
+
Ted Yu 2013-08-17, 23:19
+
Kiru Pakkirisamy 2013-08-18, 05:34
+
Ted Yu 2013-08-18, 13:39
+
Kiru Pakkirisamy 2013-08-18, 18:59
Copy link to this message
-
Re: Client Get vs Coprocessor scan performance
Would be interesting to compare against Phoenix's Skip Scan
(http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html)
which does a scan through a coprocessor and is more than 2x faster
than multi Get (plus handles multi-range scans in addition to point
gets).

James

On Aug 18, 2013, at 6:39 AM, Ted Yu <[EMAIL PROTECTED]> wrote:

> bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on
> the whole length of the key)
>
> In this case the Get's are very selective. The number of rows FuzzyRowFilter
> was evaluated against would be much higher.
> It would be nice if you remember the time each took.
>
> bq. Also, I am seeing very bad concurrent query performance
>
> Were the multi Get's performed by your coprocessor within region boundary
> of the respective coprocessor ? Just to confirm.
>
> bq. that would make Coprocessors almost single threaded across multiple
> invocations ?
>
> Let me dig into code some more.
>
> Cheers
>
>
> On Sat, Aug 17, 2013 at 10:34 PM, Kiru Pakkirisamy <
> [EMAIL PROTECTED]> wrote:
>
>> Ted,
>> On a table with 600K rows, Get'ting 100 rows seems to be faster than the
>> FuzzyRowFilter (mask on the whole length of the key). I thought the
>> FuzzyRowFilter's  SEEK_NEXT_USING_HINT would help.  All this on the client
>> side, I have not changed my CoProcessor to use the FuzzyRowFilter based on
>> the client side performance (still doing multiple get inside the
>> coprocessor). Also, I am seeing very bad concurrent query performance. Are
>> there any thing that would make Coprocessors almost single threaded across
>> multiple invocations ?
>> Again, all this after putting in 0.94.10 (for hbase-6870 sake) which seems
>> to be very good in bringing up the regions online fast and balanced. Thanks
>> and much appreciated.
>>
>> Regards,
>> - kiru
>>
>>
>> Kiru Pakkirisamy | webcloudtech.wordpress.com
>>
>>
>> ________________________________
>> From: Ted Yu <[EMAIL PROTECTED]>
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Sent: Saturday, August 17, 2013 4:19 PM
>> Subject: Re: Client Get vs Coprocessor scan performance
>>
>>
>> HBASE-6870 targeted whole table scanning for each coprocessorService call
>> which exhibited itself through:
>>
>> HTable#coprocessorService -> getStartKeysInRange -> getStartEndKeys ->
>> getRegionLocations -> MetaScanner.allTableRegions(getConfiguration(),
>> getTableName(), false)
>>
>> The cached region locations in HConnectionImplementation would be used.
>>
>> Cheers
>>
>>
>> On Sat, Aug 17, 2013 at 2:21 PM, Asaf Mesika <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Ted, can you elaborate a little bit why this issue boosts performance?
>>> I couldn't figure out from the issue comments if they execCoprocessor
>> scans
>>> the entire .META. table or and entire table, to understand the actual
>>> improvement.
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>> On Fri, Aug 9, 2013 at 8:44 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>>
>>>> I think you need HBASE-6870 which went into 0.94.8
>>>>
>>>> Upgrading should boost coprocessor performance.
>>>>
>>>> Cheers
>>>>
>>>> On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <
>> [EMAIL PROTECTED]
>>>>
>>>> wrote:
>>>>
>>>>> Ted,
>>>>> Here is the method signature/protocol
>>>>> public Map<String, Double> getFooMap<String, Double> input,
>>>>> int topN) throws IOException;
>>>>>
>>>>> There are 31 regions on 4 nodes X 8 CPU.
>>>>> I am on 0.94.6 (from Hortonworks).
>>>>> I think it seems to behave like what linwukang says, - it is almost a
>>>> full table scan in the coprocessor.
>>>>> Actually, when I set more specific ColumnPrefixFilters performance
>> went
>>>> down.
>>>>> I want to do things on the server side because, I dont want to be
>>>> sending 500K column/values to the client.
>>>>> I cannot believe a single-threaded client which does some
>> calculations
>>>> and group-by  beats the coprocessor running in 31 regions.
>>>>>
>>>>> Regards,
>>>>> - kiru
>>>>>
>>>>>
>>>>> Kiru Pakkirisamy | webcloudtech.wordpress.com
+
Kiru Pakkirisamy 2013-08-18, 19:16
+
James Taylor 2013-08-18, 21:07
+
Kiru Pakkirisamy 2013-08-18, 21:16
+
James Taylor 2013-08-19, 00:34
+
Kiru Pakkirisamy 2013-08-19, 08:36
+
James Taylor 2013-08-19, 15:34
+
Kiru Pakkirisamy 2013-08-09, 05:58
+
Kiru Pakkirisamy 2013-08-09, 20:04
+
Kiru Pakkirisamy 2013-08-11, 06:15
+
James Taylor 2013-08-12, 16:41
+
Kiru Pakkirisamy 2013-08-12, 18:27
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB