-Could it improve read performance by storing HFile consecutive on disk?
yun peng 2013-07-09, 15:49
In our use case memory/cache is small, and we want to improve read/load
(from-disk) performance by storing HFile blocks consecutively on disk...
The idea is that if we store blocks more closely on disk, then read a data
block from HFile would require fewer random disk access.
In particular, to lookup a value or to read a data block in HFile, it needs
the b-tree style root-to-leaf traversal. For each step in a traversal, it
needs load block from disk. Since the blocks along the root-to-leaf path
are not stored consecutively, those reads are typically random. I am not
sure if we can store all the block in a root-to-leaf path in a consecutive
disk area, then we can translate random reads to sequential reads, which
should be faster.