-[HBase 0.92.1] Too many stores files to compact, compaction moving slowly
HBase version : 0.92.1
Hadoop version: 0.20.2-cdh3u0
* hbase.regionserver.fileSplitTimeout : 300000
* hbase.hstore.compactionThreshold : 3
* hbase.hregion.max.filesize : 2147483648
* hbase.hstore.compaction.max : 10
* hbase.hregion.majorcompaction: 864000000000
* HBASE_HEAPSIZE : 4000
Some how a user has got his table into a complicated state. The table
has 299 regions out of which roughly 28 regions have huge amount of store
files in them, as high as 2300 (snapshot
http://pastie.org/pastes/3907336/text) files! To add to complication
the individual store files are as big as 14GB.
Now I am in pursuit of balancing the data in this table. I tried doing
manual splits. But the split requests were failing with error "Took too
long to split the files and create the references, aborting split".
To get around I increased hbase.regionserver.fileSplitTimeout.
>From this point splits happend. I went ahead and identified 10 regions
which had too many store files and did split on them. After splits daughter
regions were created with references to all the store files in the parent
region and compactions started happening. The minor compaction threshold is
10. Since there are 2000 + files (taking one instance for example) it will
do 200 sweeps of minor compaction.
Each sweep is running slow(couple of hours), since the individual files (in
the set of 10) are too big.
Now coming to questions:
A] Given we can afford down time of this table (and of cluster if needed)
can I do some thing *better* than manual splits and allowing compactions to
complete? (I am picturing a tool which scans all the HDFS directories under
the table and launches a distributed *compact and split if needed* job. Or
some thing along those lines..)
B] If not (A) , can I temporarily tweak some configurations (other than
heap given to region server) to get the table to a healthy state?
C] How come we managed to get individual files as big as 15GB, our max
region size has been configured to be 2GB?
 My theory is during the writes all requests consistently went to same
region server and we managed to flushed faster than we could compact. Happy
to be proved otherwise.