Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # dev - Beware of PREFIX_TREE block encoding


+
Vladimir Rodionov 2013-10-20, 02:32
+
Vladimir Rodionov 2013-10-20, 02:34
+
lars hofhansl 2013-10-20, 03:50
+
Vladimir Rodionov 2013-10-20, 04:08
+
lars hofhansl 2013-10-20, 04:12
+
Vladimir Rodionov 2013-10-20, 05:45
Copy link to this message
-
Re: Beware of PREFIX_TREE block encoding
Jean-Marc Spaggiari 2013-10-20, 11:06
Vladimir, any chance to run the same test with FAST_DIFF?

J
2013/10/20 Vladimir Rodionov <[EMAIL PROTECTED]>

> I wanted to try PREFIX_TREE because it is supposed to be fastest on
> seek/reseek.
>
>
> On Sat, Oct 19, 2013 at 9:12 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > I found FAST_DIFF to be the fastest of the block encoders.
> > (Prefix tree is in 0.96+ only as far as I know.)
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Vladimir Rodionov <[EMAIL PROTECTED]>
> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <
> > [EMAIL PROTECTED]>
> > Cc:
> > Sent: Saturday, October 19, 2013 9:08 PM
> > Subject: Re: Beware of PREFIX_TREE block encoding
> >
> > *Now, which encoder did you test specifically? I seen a 20-40% slowdown
> > when everything is in the blockcache (which is the worst case scenario
> > here), certainly not a 10x slowdown.*
> >
> > I have 1.3M rows (very small - 48 bytes) in a block cache which I read
> > sequentially, using encoding NONE, PREFIX_TREE and
> > StoreScanner/StoreFileScanner (close to metal - block cache :)
> >
> > Time to read all 1.3M rows reported in ms.
> >
> > encoding  = NONE,                scanner = StoreScanner;      time = 300
> > ms
> > encoding  = PREFIX_TREE,  scanner = StoreScanner;      time = 860  ms
> > encoding  = NONE              ,  scanner = StoreFileScanner; time = 52
> ms
> > encoding  = PREFIX_TREE,  scanner = StoreFileScanner; time = 545 ms
> >
> > -Vladimir
> >
> >
> >
> >
> > On Sat, Oct 19, 2013 at 8:50 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> >
> > > That is (unfortunately) a known issue. The main problem is that HBase
> > > expects each KV to be backed by a contiguous byte[]. For any prefix
> > > encoding it is thus necessary to rematerialize the KV (i.e. copy all
> the
> > > partial bytes into a new location).
> > > That is inefficient. Nobody has taken on to fix this (we're 1/2 there
> > with
> > > Cells in 0.96, though).
> > >
> > > There a jiras out there to fix this like HBASE-7320 and more recently
> > > HBASE-9794.
> > >
> > > Now, which encoder did you test specifically? I seen a 20-40% slowdown
> > > when everything is in the blockcache (which is the worst case scenario
> > > here), certainly not a 10x slowdown.
> > >
> > > Note that with block encoding the block are stored encoded in the
> > > blockcache, so more data fits into the cache, and (obviously) there's
> > less
> > > IO when the data is not in the cache). So the extra work CPU cycles and
> > > memory bandwidth used are offset by that.
> > >
> > > There're other problems too. I just filed an issue (HBASE-9807) where
> > with
> > > block encoders we make a copy of the key portion of the KV on each
> > reseek,
> > > just to compare it the current scan key.
> > >
> > > -- Lars
> > > ________________________________
> > > From: Vladimir Rodionov <[EMAIL PROTECTED]>
> > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > > Sent: Saturday, October 19, 2013 7:34 PM
> > > Subject: RE: Beware of PREFIX_TREE block encoding
> > >
> > >
> > > What I wanted to say by this? HBase still does not have block encoding
> > > which is optimal for both scan and seek (re-seek).
> > > I do not think these goals are mutually exclusive.
> > >
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: [EMAIL PROTECTED]
> > >
> > > ________________________________________
> > >
> > > From: Vladimir Rodionov [[EMAIL PROTECTED]]
> > > Sent: Saturday, October 19, 2013 7:32 PM
> > > To: [EMAIL PROTECTED]
> > > Subject: Beware of PREFIX_TREE block encoding
> > >
> > > The scan performance is bad. 10 x slower on my tests than for blocks
> with
> > > NONE encoding. I scan data directly from block cache through
> > > StoreFileScanner (bypassing all StoreScanner/KeyValueHeap stuff). It
> > should
> > > be clearly stated  that this encoding degrades overall performance
> > > significantly in favor of data size reduction and is suitable only for
+
Vladimir Rodionov 2013-10-20, 17:06
+
Matt Corgan 2013-10-21, 22:01
+
Ted Yu 2013-10-20, 03:33
+
Vladimir Rodionov 2013-10-20, 03:50