Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Could it improve read performance by storing HFile consecutive on disk?


Copy link to this message
-
Could it improve read performance by storing HFile consecutive on disk?
In our use case memory/cache is small, and we want to improve read/load
(from-disk) performance by storing HFile blocks consecutively on disk...
The idea is that if we store blocks more closely on disk, then read a data
block from HFile would require fewer random disk access.

In particular, to lookup a value or to read a data block in HFile, it needs
the b-tree style root-to-leaf traversal. For each step in a traversal, it
needs load block from disk. Since the blocks along the root-to-leaf path
are not stored consecutively, those reads are typically random. I am not
sure if we can store all the block in a root-to-leaf path in a consecutive
disk area, then we can translate random reads to sequential reads, which
should be faster.

Regards,
Yun
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB