Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Slow Get Performance (or how many disk I/O does it take for one non-cached read?)


Copy link to this message
-
Re: Slow Get Performance (or how many disk I/O does it take for one non-cached read?)
RE: HDFS Compression... that is interesting -- i didnt think HBase  forced
any HDFS specific operatoins (other than short circuit reads, which is
configurable on/off)?

... So how is the compression encoding implemented, and how do other file
systems handle it?  I dont think compression is specifically part of the
FileSystem API.
On Sat, Feb 1, 2014 at 11:06 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> HBase always loads the whole block and then seeks forward in that block
> until it finds the KV it  is looking for (there is no indexing inside the
> block).
>
> Also note that HBase has compression and block encoding. These are
> different. Compression compresses the files on disk (at the HDFS level) and
> not in memory, so it does not help with your cache size. Encoding is
> applied at the HBase block level and is retained in the block cache.
>
> I'm really curious as what kind of improvement you see with smaller block
> size. Remember that after you change BLOCKSIZE you need to issue a major
> compaction so that the data is rewritten into smaller blocks.
>
> We should really document this stuff better.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Jan Schellenberger <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Friday, January 31, 2014 10:31 PM
> Subject: RE: Slow Get Performance (or how many disk I/O does it take for
> one non-cached read?)
>
>
> A lot of useful information here...
>
> I disabled bloom filters
> I changed to gz compression (compressed files significantly)
>
> I'm now seeing about *80gets/sec/server* which is a pretty good
> improvement.
> Since I estimate that the server is capable of about 300-350 hard disk
> operations/second, that's about 4 hard disk operations/get.
>
> I will experiment with the BLOCKSIZE next.  Unfortunately upgrading our
> system to a newer HBASE/Hadoop is tricky for various IT/regulation reasons
> but I'll ask to upgrade.  From what I see, even Cloudera 4.5.0 still comes
> with HBase 94.6
>
>
>
>
> I also restarted the regionservers and am now getting
> blockCacheHitCachingRatio=51% and blockCacheHitRatio=51%.
> So conceivably, I could be hitting the:
> root index (cache hit)
> block index (cache hit)
> load on average 2 blocks to get data (cache misses most likely as my total
> heap space is 1/7 the compressed dataset)
> That would be about 52% cache hit overall and if each data access requires
> 2
> Hard Drive reads (data + checksum) then that would explain my throughput.
> It still seems high but probably within the realm of reason.
>
> Does HBase always read a full block (the 64k HFile block, not the HDFS
> block) at a time or can it just jump to a particular location within the
> block?
>
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Slow-Get-Performance-or-how-many-disk-I-O-does-it-take-for-one-non-cached-read-tp4055545p4055564.html
>
> Sent from the HBase User mailing list archive at Nabble.com.
>

--
Jay Vyas
http://jayunit100.blogspot.com