|
|
-
Minor compactions and impact of number of HFiles within a StoreAdrien Mogenet 2012-12-08, 19:19
Hi there,
I was about to tune major/minor compaction behavior and I'm wondering what are the exact (negative) aspects of handling lots (let say between 3 and 20) HFiles within a single region, considering there are only a few regions (~10) per RS. My 2 cents : - OS/HBase have to handle more file descriptors - A random GET would have to potentially search into several files (but I setup bloom filters) - Overhead of IndexSize / BloomSize is a bit larger than with a single file - We might increase data locality when rewriting a new HFile And my questions : - How could it be critical ? - Do the minor compactions help reducing major compaction time ? (e.g. for a same data volume, is it faster to merge 3 files rather than 20 files ?) - Considering I have 100% data-locality, compaction will generate lots of disk-IO reading the HFile, but is the network layer "blocking" anything when writing new HFile and spreading these new HFile's HDFS blocks among Datanode ? -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me |