Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Minor compactions and impact of number of HFiles within a Store


Copy link to this message
-
Minor compactions and impact of number of HFiles within a Store
Hi there,

I was about to tune major/minor compaction behavior and I'm wondering what
are the exact (negative) aspects of handling lots (let say between 3 and
20) HFiles within a single region, considering there are only a few regions
(~10) per RS.

My 2 cents :
- OS/HBase have to handle more file descriptors
- A random GET would have to potentially search into several files (but I
setup bloom filters)
- Overhead of IndexSize / BloomSize is a bit larger than with a single file
- We might increase data locality when rewriting a new HFile

And my questions :
- How could it be critical ?
- Do the minor compactions help reducing major compaction time ? (e.g. for
a same data volume, is it faster to merge 3 files rather than 20 files ?)
- Considering I have 100% data-locality, compaction will generate lots of
disk-IO reading the HFile, but is the network layer "blocking"   anything
when writing new HFile and spreading these new HFile's HDFS blocks among
Datanode ?

--
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB