Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Slow Get Performance (or how many disk I/O does it take for one non-cached read?)


Copy link to this message
-
Re: Slow Get Performance (or how many disk I/O does it take for one non-cached read?)
Andrew Purtell 2014-02-03, 04:13
To clarify what Lars said: We can do custom encoding of the key values in HFile blocks (FAST_DIFF, etc) in cache as well as on disk. We can also or instead do whole block compression using the usual suspects (gzip, snappy), but only as part of reading or writing HFile blocks "at the HDFS level".

> On Feb 1, 2014, at 8:10 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
>
> RE: HDFS Compression... that is interesting -- i didnt think HBase  forced
> any HDFS specific operatoins (other than short circuit reads, which is
> configurable on/off)?
>
> ... So how is the compression encoding implemented, and how do other file
> systems handle it?  I dont think compression is specifically part of the
> FileSystem API.
>
>
>> On Sat, Feb 1, 2014 at 11:06 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>>
>> HBase always loads the whole block and then seeks forward in that block
>> until it finds the KV it  is looking for (there is no indexing inside the
>> block).
>>
>> Also note that HBase has compression and block encoding. These are
>> different. Compression compresses the files on disk (at the HDFS level) and
>> not in memory, so it does not help with your cache size. Encoding is
>> applied at the HBase block level and is retained in the block cache.
>>
>> I'm really curious as what kind of improvement you see with smaller block
>> size. Remember that after you change BLOCKSIZE you need to issue a major
>> compaction so that the data is rewritten into smaller blocks.
>>
>> We should really document this stuff better.
>>
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>> From: Jan Schellenberger <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Friday, January 31, 2014 10:31 PM
>> Subject: RE: Slow Get Performance (or how many disk I/O does it take for
>> one non-cached read?)
>>
>>
>> A lot of useful information here...
>>
>> I disabled bloom filters
>> I changed to gz compression (compressed files significantly)
>>
>> I'm now seeing about *80gets/sec/server* which is a pretty good
>> improvement.
>> Since I estimate that the server is capable of about 300-350 hard disk
>> operations/second, that's about 4 hard disk operations/get.
>>
>> I will experiment with the BLOCKSIZE next.  Unfortunately upgrading our
>> system to a newer HBASE/Hadoop is tricky for various IT/regulation reasons
>> but I'll ask to upgrade.  From what I see, even Cloudera 4.5.0 still comes
>> with HBase 94.6
>>
>>
>>
>>
>> I also restarted the regionservers and am now getting
>> blockCacheHitCachingRatio=51% and blockCacheHitRatio=51%.
>> So conceivably, I could be hitting the:
>> root index (cache hit)
>> block index (cache hit)
>> load on average 2 blocks to get data (cache misses most likely as my total
>> heap space is 1/7 the compressed dataset)
>> That would be about 52% cache hit overall and if each data access requires
>> 2
>> Hard Drive reads (data + checksum) then that would explain my throughput.
>> It still seems high but probably within the realm of reason.
>>
>> Does HBase always read a full block (the 64k HFile block, not the HDFS
>> block) at a time or can it just jump to a particular location within the
>> block?
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-hbase.679495.n3.nabble.com/Slow-Get-Performance-or-how-many-disk-I-O-does-it-take-for-one-non-cached-read-tp4055545p4055564.html
>>
>> Sent from the HBase User mailing list archive at Nabble.com.
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com