Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Scanner with explicit columns list is very slow


Copy link to this message
-
Re: Scanner with explicit columns list is very slow
I profiled the last test case (5 columns total and 2 in a scan).

80% of StoreScanner.next() execution time are in :

StoreScanner.reseek() - 71%
ScanQueryMathcer.getKeyForNextColumn() - 6%
ScanQueryMathcer.getKeyForNextRow() - 2%

Should I open JIRA?
On Mon, Oct 14, 2013 at 2:03 PM, Vladimir Rodionov
<[EMAIL PROTECTED]>wrote:

> I modified tests:
>
> Now I created table with one CF and 5 columns: CQ1,..,CQ5
>
> 1. Scan.addColumn(CF, CQ1);
>     Scan.addColumn(CF, CQ3);
>
> 2. Scan.addFamily(CF);
>
> Scan performance from block cache:
>
> 1.  400K rows per sec
> 2.  1.6M rows per sec
>
> The explicit columns scan performance  is even worse in this case. It is
> much faster to scan the WHOLE rows and filter columns later in a Filter,
> than specify columns directly in a Scan.
>
> Definitely needs to be explained/investigated.
>
>
> On Mon, Oct 14, 2013 at 11:18 AM, Vladimir Rodionov <
> [EMAIL PROTECTED]> wrote:
>
>> Its 0.94.6 and there is chance that the issue has been fixed already
>>
>> Simple table: one column + one qualifier
>>
>> Two type of scans:
>>
>> 1. Scan.addFamily(CF)
>>
>> 2. Scan.addColumn(CF, CQ)
>>
>> Both run on block cache (all data in memory)
>>
>> Tested on StoreScanner directly.
>>
>> 1. 4.2M KVs per sec per one thread
>> 2. 1.5M KVs per second per one thread.
>>
>> The difference? First scanner's ScanQueryMatcher returns INCLUDE, DONE,
>> second - INCLUDE_NEXT_ROW, DONE
>> The cost of Row's reseek is huge.
>>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: [EMAIL PROTECTED]
>>
>>
>> Confidentiality Notice:  The information contained in this message,
>> including any attachments hereto, may be confidential and is intended to be
>> read only by the individual or entity to whom this message is addressed. If
>> the reader of this message is not the intended recipient or an agent or
>> designee of the intended recipient, please note that any review, use,
>> disclosure or distribution of this message or its attachments, in any form,
>> is strictly prohibited.  If you have received this message in error, please
>> immediately notify the sender and/or [EMAIL PROTECTED] and
>> delete or destroy any copy of this message and its attachments.
>>
>
>