Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Disabled automated compaction - table still compacting


+
David Koch 2013-07-09, 11:38
Copy link to this message
-
Re: Disabled automated compaction - table still compacting
Jean-Marc Spaggiari 2013-07-09, 14:41
Hi David,

Minor compactions can be promoted to Major compactions when all the
files are selected for compaction. And the property below will not
avoid that to occur.

Section 9.7.6.5 there: http://hbase.apache.org/book/regions.arch.html

JM
2013/7/9 David Koch <[EMAIL PROTECTED]>:
> Hello,
>
> We disabled automated major compactions by setting
> hbase.hregion.majorcompaction=0.
> This was to avoid issues during buik import of data since compactions
> seemed to cause the running imports to crash. However, even after
> disabling, region server logs still show compactions going on, as well as
> aborted compactions. We also get compaction queue size warnings in Cloudera
> Manager.
>
> Why is this the case?
>
> To be fair, we only disabled automated compactions AFTER the import failed
> for the first time (yes, HBase was restarted) so maybe there are some
> trailing compactions, but the queue size keeps increasing which I guess
> should not be the case. Then again, I don't know how aborted compactions
> are counted - i.e not sure whether or not to trust the metrics on this.
>
> A bit more about what I am trying to accomplish:
>
> I am bulk loading about 100 indexed .lzo files with 20 * 10^6 Key-Value
> (0.5kb) each into an HBase table. Each file is loaded by a separate Mapper
> job, several of these jobs run in parallel to make sure all task trackers
> are used. Key distribution is the same in each file so even region growth
> is to be expected. We did not pre-split the table as it does not seem to
> have been a limiting factor earlier.
>
> On a related note. What if any experience do other HBase/Cloudera users
> have with the Snapshotting feature detailed below?
>
> http://www.cloudera.com/content/cloudera-content/cloudera-
> docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_12.html
>
> We need of a robust way to do inter-cluster cloning/back-up of tables,
> preferably without taking the source table offline or impacting performance
> of the source cluster. We only use HDFS files for importing because the
> CopyTable job needs to run on the source cluster and cannot be resumed once
> it fails.
>
> Thanks,
>
> /David
+
Bryan Beaudreault 2013-07-09, 14:48
+
Ted Yu 2013-07-09, 14:58
+
David Koch 2013-07-09, 16:41
+
Michael Segel 2013-07-09, 16:53
+
David Koch 2013-07-10, 07:19