Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Disabled automated compaction - table still compacting


+
David Koch 2013-07-09, 11:38
+
Jean-Marc Spaggiari 2013-07-09, 14:41
Copy link to this message
-
Re: Disabled automated compaction - table still compacting
You should be able to limit what JM describes by tuning the following two
configs:

hbase.hstore.compactionThreshold
hbase.hstore.compaction.max

Beware of this property as well when tuning the above so you don't
accidentally cause blocking of flushes, though I imagine you would be
tuning down not up and so wouldn't be a
problem: hbase.hstore.blockingStoreFiles
On Tue, Jul 9, 2013 at 10:41 AM, Jean-Marc Spaggiari <
[EMAIL PROTECTED]> wrote:

> Hi David,
>
> Minor compactions can be promoted to Major compactions when all the
> files are selected for compaction. And the property below will not
> avoid that to occur.
>
> Section 9.7.6.5 there: http://hbase.apache.org/book/regions.arch.html
>
> JM
>
>
> 2013/7/9 David Koch <[EMAIL PROTECTED]>:
> > Hello,
> >
> > We disabled automated major compactions by setting
> > hbase.hregion.majorcompaction=0.
> > This was to avoid issues during buik import of data since compactions
> > seemed to cause the running imports to crash. However, even after
> > disabling, region server logs still show compactions going on, as well as
> > aborted compactions. We also get compaction queue size warnings in
> Cloudera
> > Manager.
> >
> > Why is this the case?
> >
> > To be fair, we only disabled automated compactions AFTER the import
> failed
> > for the first time (yes, HBase was restarted) so maybe there are some
> > trailing compactions, but the queue size keeps increasing which I guess
> > should not be the case. Then again, I don't know how aborted compactions
> > are counted - i.e not sure whether or not to trust the metrics on this.
> >
> > A bit more about what I am trying to accomplish:
> >
> > I am bulk loading about 100 indexed .lzo files with 20 * 10^6 Key-Value
> > (0.5kb) each into an HBase table. Each file is loaded by a separate
> Mapper
> > job, several of these jobs run in parallel to make sure all task trackers
> > are used. Key distribution is the same in each file so even region growth
> > is to be expected. We did not pre-split the table as it does not seem to
> > have been a limiting factor earlier.
> >
> > On a related note. What if any experience do other HBase/Cloudera users
> > have with the Snapshotting feature detailed below?
> >
> > http://www.cloudera.com/content/cloudera-content/cloudera-
> > docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_12.html
> >
> > We need of a robust way to do inter-cluster cloning/back-up of tables,
> > preferably without taking the source table offline or impacting
> performance
> > of the source cluster. We only use HDFS files for importing because the
> > CopyTable job needs to run on the source cluster and cannot be resumed
> once
> > it fails.
> >
> > Thanks,
> >
> > /David
>
+
Ted Yu 2013-07-09, 14:58
+
David Koch 2013-07-09, 16:41
+
Michael Segel 2013-07-09, 16:53
+
David Koch 2013-07-10, 07:19
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB