-Smart Managed Major Compactions
Bryan Beaudreault 2012-07-18, 17:26
Before I start, I'm running cdh3u2, so 0.90.4.
I am looking into managing major compactions ourselves, but there doesn't appear to be any mechanisms I can hook in to determine which tables need compacting. Ideally each time my cron job runs it would compact the table with the next longest time since compaction, but I can't find a way to access this metric.
The default major compaction algorithm seems to be able to get the oldest modified time for all store files for a region to determine when it was last major compacted. I know this is not ideal, but it seems good enough. Unfortunately I don't see an easy way to get this.
Alternatively I can keep my own compaction log, but I'd rather not have to do that if there is another way. Is there some easy way to access this value that I am not seeing? I know I could construct the paths to store files myself, but this seems less than ideal as well (i.e. might break when we upgrade, etc).