Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Slow Get Performance (or how many disk I/O does it take for one non-cached read?)


Copy link to this message
-
Re: Slow Get Performance (or how many disk I/O does it take for one non-cached read?)
RE: HDFS Compression... that is interesting -- i didnt think HBase  forced
any HDFS specific operatoins (other than short circuit reads, which is
configurable on/off)?

... So how is the compression encoding implemented, and how do other file
systems handle it?  I dont think compression is specifically part of the
FileSystem API.
On Sat, Feb 1, 2014 at 11:06 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> HBase always loads the whole block and then seeks forward in that block
> until it finds the KV it  is looking for (there is no indexing inside the
> block).
>
> Also note that HBase has compression and block encoding. These are
> different. Compression compresses the files on disk (at the HDFS level) and
> not in memory, so it does not help with your cache size. Encoding is
> applied at the HBase block level and is retained in the block cache.
>
> I'm really curious as what kind of improvement you see with smaller block
> size. Remember that after you change BLOCKSIZE you need to issue a major
> compaction so that the data is rewritten into smaller blocks.
>
> We should really document this stuff better.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Jan Schellenberger <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Friday, January 31, 2014 10:31 PM
> Subject: RE: Slow Get Performance (or how many disk I/O does it take for
> one non-cached read?)
>
>
> A lot of useful information here...
>
> I disabled bloom filters
> I changed to gz compression (compressed files significantly)
>
> I'm now seeing about *80gets/sec/server* which is a pretty good
> improvement.
> Since I estimate that the server is capable of about 300-350 hard disk
> operations/second, that's about 4 hard disk operations/get.
>
> I will experiment with the BLOCKSIZE next.  Unfortunately upgrading our
> system to a newer HBASE/Hadoop is tricky for various IT/regulation reasons
> but I'll ask to upgrade.  From what I see, even Cloudera 4.5.0 still comes
> with HBase 94.6
>
>
>
>
> I also restarted the regionservers and am now getting
> blockCacheHitCachingRatio=51% and blockCacheHitRatio=51%.
> So conceivably, I could be hitting the:
> root index (cache hit)
> block index (cache hit)
> load on average 2 blocks to get data (cache misses most likely as my total
> heap space is 1/7 the compressed dataset)
> That would be about 52% cache hit overall and if each data access requires
> 2
> Hard Drive reads (data + checksum) then that would explain my throughput.
> It still seems high but probably within the realm of reason.
>
> Does HBase always read a full block (the 64k HFile block, not the HDFS
> block) at a time or can it just jump to a particular location within the
> block?
>
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Slow-Get-Performance-or-how-many-disk-I-O-does-it-take-for-one-non-cached-read-tp4055545p4055564.html
>
> Sent from the HBase User mailing list archive at Nabble.com.
>

--
Jay Vyas
http://jayunit100.blogspot.com

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB