Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Scanner with explicit columns list is very slow


Copy link to this message
-
Re: Scanner with explicit columns list is very slow
One fast optimization:

There is no need to call reseek on INCLUDE_NEXT_COL - this is going to be
the same row in the same KeyValueScanner (currently on top of KeyValueHeap).
On Mon, Oct 14, 2013 at 2:46 PM, Vladimir Rodionov
<[EMAIL PROTECTED]>wrote:

> I profiled the last test case (5 columns total and 2 in a scan).
>
> 80% of StoreScanner.next() execution time are in :
>
> StoreScanner.reseek() - 71%
> ScanQueryMathcer.getKeyForNextColumn() - 6%
> ScanQueryMathcer.getKeyForNextRow() - 2%
>
> Should I open JIRA?
>
>
> On Mon, Oct 14, 2013 at 2:03 PM, Vladimir Rodionov <[EMAIL PROTECTED]
> > wrote:
>
>> I modified tests:
>>
>> Now I created table with one CF and 5 columns: CQ1,..,CQ5
>>
>> 1. Scan.addColumn(CF, CQ1);
>>     Scan.addColumn(CF, CQ3);
>>
>> 2. Scan.addFamily(CF);
>>
>> Scan performance from block cache:
>>
>> 1.  400K rows per sec
>> 2.  1.6M rows per sec
>>
>> The explicit columns scan performance  is even worse in this case. It is
>> much faster to scan the WHOLE rows and filter columns later in a Filter,
>> than specify columns directly in a Scan.
>>
>> Definitely needs to be explained/investigated.
>>
>>
>> On Mon, Oct 14, 2013 at 11:18 AM, Vladimir Rodionov <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Its 0.94.6 and there is chance that the issue has been fixed already
>>>
>>> Simple table: one column + one qualifier
>>>
>>> Two type of scans:
>>>
>>> 1. Scan.addFamily(CF)
>>>
>>> 2. Scan.addColumn(CF, CQ)
>>>
>>> Both run on block cache (all data in memory)
>>>
>>> Tested on StoreScanner directly.
>>>
>>> 1. 4.2M KVs per sec per one thread
>>> 2. 1.5M KVs per second per one thread.
>>>
>>> The difference? First scanner's ScanQueryMatcher returns INCLUDE, DONE,
>>> second - INCLUDE_NEXT_ROW, DONE
>>> The cost of Row's reseek is huge.
>>>
>>> Best regards,
>>> Vladimir Rodionov
>>> Principal Platform Engineer
>>> Carrier IQ, www.carrieriq.com
>>> e-mail: [EMAIL PROTECTED]
>>>
>>>
>>> Confidentiality Notice:  The information contained in this message,
>>> including any attachments hereto, may be confidential and is intended to be
>>> read only by the individual or entity to whom this message is addressed. If
>>> the reader of this message is not the intended recipient or an agent or
>>> designee of the intended recipient, please note that any review, use,
>>> disclosure or distribution of this message or its attachments, in any form,
>>> is strictly prohibited.  If you have received this message in error, please
>>> immediately notify the sender and/or [EMAIL PROTECTED] and
>>> delete or destroy any copy of this message and its attachments.
>>>
>>
>>
>