Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Slow Get Performance (or how many disk I/O does it take for one non-cached read?)


Copy link to this message
-
Re: Slow Get Performance (or how many disk I/O does it take for one non-cached read?)
Ted Yu 2014-02-01, 05:37
I realized that after hitting Send button :-)

And 0.94.17 is around the corner, right ?
On Fri, Jan 31, 2014 at 9:27 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> 0.94.16 is out already :)
>
>
>
> ----- Original Message -----
> From: Ted Yu <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Cc:
> Sent: Friday, January 31, 2014 8:28 PM
> Subject: Re: Slow Get Performance (or how many disk I/O does it take for
> one non-cached read?)
>
> For #4,
> bq. has this shortcut enabled by default
>
> Inline checksum is different from short circuit read. Inline checksum is
> enabled by default in 0.96 and later releases - see HBASE-8322
>
> Meanwhile, you can consider upgrading to 0.94.15 - there have been quite
> some improvements since 0.94.6
>
> Cheers
>
>
>
> On Fri, Jan 31, 2014 at 6:38 PM, Jan Schellenberger <[EMAIL PROTECTED]
> >wrote:
>
> > Thank you.  I will have to test these things one at a time.
> >
> > I re-enabled compression (SNAPPY for now) and changed the block encoding
> to
> > FAST_DIFF.
> >
> > #1 I will try GZ encoding.
> > #2 The block cache size is already at .4. I will try to increase it a bit
> > more but I will never get the whole set into memory.
> > I will disable bloom filter.
> >
> > #4 I will investigate this.  I thought I read somewhere that cloudera 4.3
> > has this shortcut enabled by default but I will try to verify.
> >
> > #3 I'm not sure I understand this suggestion - are you saying doing
> region
> > custom region splitting?  Each region is fully compacted so there is only
> > one HFile.  The queries I do are: "get me the most recent versions, up to
> > 200".  However I need to store more versions, because I may ask "get me
> the
> > most recent versions, up to 200 that I would have seen yesterday".
> >
> >
> > #5 HDFS short circuit is already enabled already by default.
> > #6 yes SSD would clearly be better.
> >
> > #7 The average result of the get is fairly small.  no more than 1kB I'd
> > say.
> > We do hit each key with roughly the same probability.
> >
> >
> >
> > I'm concerned about the block cache... It sounds like the improper blocks
> > are being cached.  i thought there was a preference to cache index and
> > bloom
> > blocks.
> >
> > I'm currently* running 60 queries/second* one node and it's reading
> > blockCacheHitRatio=29 and blockCacheHitCachingRatio=65% (not sure what's
> > the
> > difference).
> >
> > I also see rootIndexSize=122k totalStaticIndexSize=88MB and
> > totalstaticBloomSize=80MB (will disable bloomfilters in next run of
> this).
> > hdfslocality=100%
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-hbase.679495.n3.nabble.com/Slow-Get-Performance-or-how-many-disk-I-O-does-it-take-for-one-non-cached-read-tp4055545p4055554.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
>
>