Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Client Get vs Coprocessor scan performance


Copy link to this message
-
Re: Client Get vs Coprocessor scan performance
James Taylor 2013-08-18, 18:44
Would be interesting to compare against Phoenix's Skip Scan
(http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html)
which does a scan through a coprocessor and is more than 2x faster
than multi Get (plus handles multi-range scans in addition to point
gets).

James

On Aug 18, 2013, at 6:39 AM, Ted Yu <[EMAIL PROTECTED]> wrote:

> bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on
> the whole length of the key)
>
> In this case the Get's are very selective. The number of rows FuzzyRowFilter
> was evaluated against would be much higher.
> It would be nice if you remember the time each took.
>
> bq. Also, I am seeing very bad concurrent query performance
>
> Were the multi Get's performed by your coprocessor within region boundary
> of the respective coprocessor ? Just to confirm.
>
> bq. that would make Coprocessors almost single threaded across multiple
> invocations ?
>
> Let me dig into code some more.
>
> Cheers
>
>
> On Sat, Aug 17, 2013 at 10:34 PM, Kiru Pakkirisamy <
> [EMAIL PROTECTED]> wrote:
>
>> Ted,
>> On a table with 600K rows, Get'ting 100 rows seems to be faster than the
>> FuzzyRowFilter (mask on the whole length of the key). I thought the
>> FuzzyRowFilter's  SEEK_NEXT_USING_HINT would help.  All this on the client
>> side, I have not changed my CoProcessor to use the FuzzyRowFilter based on
>> the client side performance (still doing multiple get inside the
>> coprocessor). Also, I am seeing very bad concurrent query performance. Are
>> there any thing that would make Coprocessors almost single threaded across
>> multiple invocations ?
>> Again, all this after putting in 0.94.10 (for hbase-6870 sake) which seems
>> to be very good in bringing up the regions online fast and balanced. Thanks
>> and much appreciated.
>>
>> Regards,
>> - kiru
>>
>>
>> Kiru Pakkirisamy | webcloudtech.wordpress.com
>>
>>
>> ________________________________
>> From: Ted Yu <[EMAIL PROTECTED]>
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Sent: Saturday, August 17, 2013 4:19 PM
>> Subject: Re: Client Get vs Coprocessor scan performance
>>
>>
>> HBASE-6870 targeted whole table scanning for each coprocessorService call
>> which exhibited itself through:
>>
>> HTable#coprocessorService -> getStartKeysInRange -> getStartEndKeys ->
>> getRegionLocations -> MetaScanner.allTableRegions(getConfiguration(),
>> getTableName(), false)
>>
>> The cached region locations in HConnectionImplementation would be used.
>>
>> Cheers
>>
>>
>> On Sat, Aug 17, 2013 at 2:21 PM, Asaf Mesika <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Ted, can you elaborate a little bit why this issue boosts performance?
>>> I couldn't figure out from the issue comments if they execCoprocessor
>> scans
>>> the entire .META. table or and entire table, to understand the actual
>>> improvement.
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>> On Fri, Aug 9, 2013 at 8:44 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>>
>>>> I think you need HBASE-6870 which went into 0.94.8
>>>>
>>>> Upgrading should boost coprocessor performance.
>>>>
>>>> Cheers
>>>>
>>>> On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <
>> [EMAIL PROTECTED]
>>>>
>>>> wrote:
>>>>
>>>>> Ted,
>>>>> Here is the method signature/protocol
>>>>> public Map<String, Double> getFooMap<String, Double> input,
>>>>> int topN) throws IOException;
>>>>>
>>>>> There are 31 regions on 4 nodes X 8 CPU.
>>>>> I am on 0.94.6 (from Hortonworks).
>>>>> I think it seems to behave like what linwukang says, - it is almost a
>>>> full table scan in the coprocessor.
>>>>> Actually, when I set more specific ColumnPrefixFilters performance
>> went
>>>> down.
>>>>> I want to do things on the server side because, I dont want to be
>>>> sending 500K column/values to the client.
>>>>> I cannot believe a single-threaded client which does some
>> calculations
>>>> and group-by  beats the coprocessor running in 31 regions.
>>>>>
>>>>> Regards,
>>>>> - kiru
>>>>>
>>>>>
>>>>> Kiru Pakkirisamy | webcloudtech.wordpress.com