Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Could it improve read performance by storing HFile consecutive on disk?


Copy link to this message
-
Could it improve read performance by storing HFile consecutive on disk?
In our use case memory/cache is small, and we want to improve read/load
(from-disk) performance by storing HFile blocks consecutively on disk...
The idea is that if we store blocks more closely on disk, then read a data
block from HFile would require fewer random disk access.

In particular, to lookup a value or to read a data block in HFile, it needs
the b-tree style root-to-leaf traversal. For each step in a traversal, it
needs load block from disk. Since the blocks along the root-to-leaf path
are not stored consecutively, those reads are typically random. I am not
sure if we can store all the block in a root-to-leaf path in a consecutive
disk area, then we can translate random reads to sequential reads, which
should be faster.

Regards,
Yun