We disabled automated major compactions by setting
This was to avoid issues during buik import of data since compactions
seemed to cause the running imports to crash. However, even after
disabling, region server logs still show compactions going on, as well as
aborted compactions. We also get compaction queue size warnings in Cloudera
Why is this the case?
To be fair, we only disabled automated compactions AFTER the import failed
for the first time (yes, HBase was restarted) so maybe there are some
trailing compactions, but the queue size keeps increasing which I guess
should not be the case. Then again, I don't know how aborted compactions
are counted - i.e not sure whether or not to trust the metrics on this.
A bit more about what I am trying to accomplish:
I am bulk loading about 100 indexed .lzo files with 20 * 10^6 Key-Value
(0.5kb) each into an HBase table. Each file is loaded by a separate Mapper
job, several of these jobs run in parallel to make sure all task trackers
are used. Key distribution is the same in each file so even region growth
is to be expected. We did not pre-split the table as it does not seem to
have been a limiting factor earlier.
On a related note. What if any experience do other HBase/Cloudera users
have with the Snapshotting feature detailed below?
We need of a robust way to do inter-cluster cloning/back-up of tables,
preferably without taking the source table offline or impacting performance
of the source cluster. We only use HDFS files for importing because the
CopyTable job needs to run on the source cluster and cannot be resumed once
Jean-Marc Spaggiari 2013-07-09, 14:41
Bryan Beaudreault 2013-07-09, 14:48
Ted Yu 2013-07-09, 14:58
David Koch 2013-07-09, 16:41
Michael Segel 2013-07-09, 16:53
David Koch 2013-07-10, 07:19