Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Minor compactions and impact of number of HFiles within a Store


Copy link to this message
-
Minor compactions and impact of number of HFiles within a Store
Adrien Mogenet 2012-12-08, 19:19
Hi there,

I was about to tune major/minor compaction behavior and I'm wondering what
are the exact (negative) aspects of handling lots (let say between 3 and
20) HFiles within a single region, considering there are only a few regions
(~10) per RS.

My 2 cents :
- OS/HBase have to handle more file descriptors
- A random GET would have to potentially search into several files (but I
setup bloom filters)
- Overhead of IndexSize / BloomSize is a bit larger than with a single file
- We might increase data locality when rewriting a new HFile

And my questions :
- How could it be critical ?
- Do the minor compactions help reducing major compaction time ? (e.g. for
a same data volume, is it faster to merge 3 files rather than 20 files ?)
- Considering I have 100% data-locality, compaction will generate lots of
disk-IO reading the HFile, but is the network layer "blocking"   anything
when writing new HFile and spreading these new HFile's HDFS blocks among
Datanode ?

--
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me