Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Disabled automated compaction - table still compacting

Copy link to this message
Re: Disabled automated compaction - table still compacting
Michael Segel 2013-07-09, 16:53
Silly question...

Why are you trying to disable automated compaction?

And then the equally silly question... are you attempting to run  full compactions manually?
On Jul 9, 2013, at 11:41 AM, David Koch <[EMAIL PROTECTED]> wrote:

> Hello,
> Thank you for your replies.
> So, as suggested, I tweaked the following settings in Cloudera Manager:
> hbase.hstore.compactionThreshold=10000
> hbase.hstore.compaction.max - I did not touch, I tried setting it to "0"
> but the minimum is 2
> I can't see any compactions being launched but the job still crashes,
> here's a sample of the logs (*)
> Run out of memory; HRegionServer will abort itself immediately
> java.lang.OutOfMemoryError: Direct buffer memory
> Failed open of
> region=<table_name_removed>,\xF7\x98\x98~'\xD3E\x89\xA2\xDF\x10\xB4\x02K\xC2\x17,1373363234166.363473b8981db865db05c47ccdc45355.,
> starting to roll back the global memstore size.
> org.apache.hadoop.hbase.DroppedSnapshotException: region:
> Anyway: http://cdn.memegenerator.net/instances/400x/28234921.jpg
> Should it be that hard to import a reasonable amount of data into a
> default-configured cluster of 10 reasonably powerful machines? If you are
> inclined to help, I'll gladly provide more in-depth information.
> Thank you,
> /David
> (*) (I am browsing this from Cloudera Manager since I don't have shell
> access to the nodes)
> On Tue, Jul 9, 2013 at 4:58 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> Do you specify startTime and endTime parameters for the CopyTable job ?
>> Cheers
>> On Tue, Jul 9, 2013 at 4:38 AM, David Koch <[EMAIL PROTECTED]> wrote:
>>> Hello,
>>> We disabled automated major compactions by setting
>>> hbase.hregion.majorcompaction=0.
>>> This was to avoid issues during buik import of data since compactions
>>> seemed to cause the running imports to crash. However, even after
>>> disabling, region server logs still show compactions going on, as well as
>>> aborted compactions. We also get compaction queue size warnings in
>> Cloudera
>>> Manager.
>>> Why is this the case?
>>> To be fair, we only disabled automated compactions AFTER the import
>> failed
>>> for the first time (yes, HBase was restarted) so maybe there are some
>>> trailing compactions, but the queue size keeps increasing which I guess
>>> should not be the case. Then again, I don't know how aborted compactions
>>> are counted - i.e not sure whether or not to trust the metrics on this.
>>> A bit more about what I am trying to accomplish:
>>> I am bulk loading about 100 indexed .lzo files with 20 * 10^6 Key-Value
>>> (0.5kb) each into an HBase table. Each file is loaded by a separate
>> Mapper
>>> job, several of these jobs run in parallel to make sure all task trackers
>>> are used. Key distribution is the same in each file so even region growth
>>> is to be expected. We did not pre-split the table as it does not seem to
>>> have been a limiting factor earlier.
>>> On a related note. What if any experience do other HBase/Cloudera users
>>> have with the Snapshotting feature detailed below?
>>> http://www.cloudera.com/content/cloudera-content/cloudera-
>>> docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_12.html
>>> We need of a robust way to do inter-cluster cloning/back-up of tables,
>>> preferably without taking the source table offline or impacting
>> performance
>>> of the source cluster. We only use HDFS files for importing because the
>>> CopyTable job needs to run on the source cluster and cannot be resumed
>> once
>>> it fails.
>>> Thanks,
>>> /David