Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Beware of PREFIX_TREE block encoding


+
Vladimir Rodionov 2013-10-20, 02:32
+
Vladimir Rodionov 2013-10-20, 02:34
+
lars hofhansl 2013-10-20, 03:50
+
Vladimir Rodionov 2013-10-20, 04:08
+
lars hofhansl 2013-10-20, 04:12
+
Vladimir Rodionov 2013-10-20, 05:45
Copy link to this message
-
Re: Beware of PREFIX_TREE block encoding
Vladimir, any chance to run the same test with FAST_DIFF?

J
2013/10/20 Vladimir Rodionov <[EMAIL PROTECTED]>

> I wanted to try PREFIX_TREE because it is supposed to be fastest on
> seek/reseek.
>
>
> On Sat, Oct 19, 2013 at 9:12 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > I found FAST_DIFF to be the fastest of the block encoders.
> > (Prefix tree is in 0.96+ only as far as I know.)
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Vladimir Rodionov <[EMAIL PROTECTED]>
> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <
> > [EMAIL PROTECTED]>
> > Cc:
> > Sent: Saturday, October 19, 2013 9:08 PM
> > Subject: Re: Beware of PREFIX_TREE block encoding
> >
> > *Now, which encoder did you test specifically? I seen a 20-40% slowdown
> > when everything is in the blockcache (which is the worst case scenario
> > here), certainly not a 10x slowdown.*
> >
> > I have 1.3M rows (very small - 48 bytes) in a block cache which I read
> > sequentially, using encoding NONE, PREFIX_TREE and
> > StoreScanner/StoreFileScanner (close to metal - block cache :)
> >
> > Time to read all 1.3M rows reported in ms.
> >
> > encoding  = NONE,                scanner = StoreScanner;      time = 300
> > ms
> > encoding  = PREFIX_TREE,  scanner = StoreScanner;      time = 860  ms
> > encoding  = NONE              ,  scanner = StoreFileScanner; time = 52
> ms
> > encoding  = PREFIX_TREE,  scanner = StoreFileScanner; time = 545 ms
> >
> > -Vladimir
> >
> >
> >
> >
> > On Sat, Oct 19, 2013 at 8:50 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> >
> > > That is (unfortunately) a known issue. The main problem is that HBase
> > > expects each KV to be backed by a contiguous byte[]. For any prefix
> > > encoding it is thus necessary to rematerialize the KV (i.e. copy all
> the
> > > partial bytes into a new location).
> > > That is inefficient. Nobody has taken on to fix this (we're 1/2 there
> > with
> > > Cells in 0.96, though).
> > >
> > > There a jiras out there to fix this like HBASE-7320 and more recently
> > > HBASE-9794.
> > >
> > > Now, which encoder did you test specifically? I seen a 20-40% slowdown
> > > when everything is in the blockcache (which is the worst case scenario
> > > here), certainly not a 10x slowdown.
> > >
> > > Note that with block encoding the block are stored encoded in the
> > > blockcache, so more data fits into the cache, and (obviously) there's
> > less
> > > IO when the data is not in the cache). So the extra work CPU cycles and
> > > memory bandwidth used are offset by that.
> > >
> > > There're other problems too. I just filed an issue (HBASE-9807) where
> > with
> > > block encoders we make a copy of the key portion of the KV on each
> > reseek,
> > > just to compare it the current scan key.
> > >
> > > -- Lars
> > > ________________________________
> > > From: Vladimir Rodionov <[EMAIL PROTECTED]>
> > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > > Sent: Saturday, October 19, 2013 7:34 PM
> > > Subject: RE: Beware of PREFIX_TREE block encoding
> > >
> > >
> > > What I wanted to say by this? HBase still does not have block encoding
> > > which is optimal for both scan and seek (re-seek).
> > > I do not think these goals are mutually exclusive.
> > >
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: [EMAIL PROTECTED]
> > >
> > > ________________________________________
> > >
> > > From: Vladimir Rodionov [[EMAIL PROTECTED]]
> > > Sent: Saturday, October 19, 2013 7:32 PM
> > > To: [EMAIL PROTECTED]
> > > Subject: Beware of PREFIX_TREE block encoding
> > >
> > > The scan performance is bad. 10 x slower on my tests than for blocks
> with
> > > NONE encoding. I scan data directly from block cache through
> > > StoreFileScanner (bypassing all StoreScanner/KeyValueHeap stuff). It
> > should
> > > be clearly stated  that this encoding degrades overall performance
> > > significantly in favor of data size reduction and is suitable only for
+
Vladimir Rodionov 2013-10-20, 17:06
+
Matt Corgan 2013-10-21, 22:01
+
Ted Yu 2013-10-20, 03:33
+
Vladimir Rodionov 2013-10-20, 03:50
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB