Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Scanner with explicit columns list is very slow


Copy link to this message
-
Re: Scanner with explicit columns list is very slow
Vladimir Rodionov 2013-10-15, 05:28
Yes, I load data into HRegion (with CACHE_ON_WRITE) than call flashcache()
(no data in memstore).

This is what I found: the default implementation of  ExplicitColumnMatcher
is (possibly) tuned to very large rows, I would say - very large. We need a
hint for scan which  tells StoreScanner which strategy to use :

1. ExplicitColumnMatcher with reseeks (what we have currently) for very
large rows
Or for small/medium rows
2. Remove explicit columns/families  from a Scan and replace them with
additional filter which actually keeps columnFamilyMap from scan and
verifies every KV  matches with this map.

I have created such a filter (ExplicitScanReplacementFilter) and verified
that it works much better than case 1. for small rows. For 1 CF + 5 CQs and
Scan with 2 CQs I have:

400K rows per sec with default
1.25M with ExplicitScanReplacementFilter

ExplicitScanReplacementFilter I will optimize even more and will probably
get tomorrow 1.4-1.5M rows per sec.
We need a JIRA and I will open one tomorrow.

-Vladimir
On Mon, Oct 14, 2013 at 9:38 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Interesting. Thanks for doing the testing/profiling Vladimir!
>
>
> Generally reseeks are better if they can skip many KVs.
>
> For example if you have many versions of the same row/col,
> INCLUDE_NEXT_COL will be better than issuing many INCLUDEs, same with
> INCLUDE_NEXT_ROW if there are many columns.
>
> Since the number of columns/versions is not known at scan time (and can in
> fact vary between rows) it is hard to always do the right thing. It also
> depends on how large the KVs are average. So replacing INCLUDE_NEXT_XXX
> with INCLUDE is not always the right idea.
>
>
> Thinking aloud... We could take the VERSIONS setting of the column family
> into account as a guideline for the expected number of versions (but
> there's no guarantee about how many version we'll actually have until we
> had a compaction), and replace INCLUDE_NEXT_COL with INCLUDE if VERSIONS is
> small (maybe < 10 or so). Maybe that'd be worth a jira...
>
>
> There are some fixes in 0.94.12 (HBASE-8930, avoid a superfluous reseek in
> some cases), and HBASE-9732 might help in 0.94.13 (avoid memory fences on
> an volatile on each seek/reseek).
>
> It also would be nice to figure out why reseek is so much more expensive.
> If the KV we reseek to is on the same block it should just scan forward,
> otherwise it'll look in the appropriate block. It probably is the creation
> of the fake KV we want to seek to (like firstOnRow, lastOnRow, etc), which
> case there's not much we can.
>
>
> Lastly, I've not spend much time profiling the ExplicitColumnMatcher, yet,
> looks like I should start doing that.
>
>
> So in your case everything is in the blockcache, no data in the memstore?
>
> -- Lars
>
>
>
> ________________________________
>  From: Vladimir Rodionov <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Monday, October 14, 2013 2:49 PM
> Subject: Re: Scanner with explicit columns list is very slow
>
>
> One fast optimization:
>
> There is no need to call reseek on INCLUDE_NEXT_COL - this is going to be
> the same row in the same KeyValueScanner (currently on top of
> KeyValueHeap).
>
>
>
>
>
> On Mon, Oct 14, 2013 at 2:46 PM, Vladimir Rodionov
> <[EMAIL PROTECTED]>wrote:
>
> > I profiled the last test case (5 columns total and 2 in a scan).
> >
> > 80% of StoreScanner.next() execution time are in :
> >
> > StoreScanner.reseek() - 71%
> > ScanQueryMathcer.getKeyForNextColumn() - 6%
> > ScanQueryMathcer.getKeyForNextRow() - 2%
> >
> > Should I open JIRA?
> >
> >
> > On Mon, Oct 14, 2013 at 2:03 PM, Vladimir Rodionov <
> [EMAIL PROTECTED]
> > > wrote:
> >
> >> I modified tests:
> >>
> >> Now I created table with one CF and 5 columns: CQ1,..,CQ5
> >>
> >> 1. Scan.addColumn(CF, CQ1);
> >>     Scan.addColumn(CF, CQ3);
> >>
> >> 2. Scan.addFamily(CF);
> >>
> >> Scan performance from block cache:
> >>
> >> 1.  400K rows per sec
> >> 2.  1.6M rows per sec