Thank you for your replies.
So, as suggested, I tweaked the following settings in Cloudera Manager:
hbase.hstore.compaction.max - I did not touch, I tried setting it to "0"
but the minimum is 2
I can't see any compactions being launched but the job still crashes,
here's a sample of the logs (*)
Run out of memory; HRegionServer will abort itself immediately
java.lang.OutOfMemoryError: Direct buffer memory
Failed open of
starting to roll back the global memstore size.
Should it be that hard to import a reasonable amount of data into a
default-configured cluster of 10 reasonably powerful machines? If you are
inclined to help, I'll gladly provide more in-depth information.
(*) (I am browsing this from Cloudera Manager since I don't have shell
access to the nodes)
On Tue, Jul 9, 2013 at 4:58 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> Do you specify startTime and endTime parameters for the CopyTable job ?
> On Tue, Jul 9, 2013 at 4:38 AM, David Koch <[EMAIL PROTECTED]> wrote:
> > Hello,
> > We disabled automated major compactions by setting
> > hbase.hregion.majorcompaction=0.
> > This was to avoid issues during buik import of data since compactions
> > seemed to cause the running imports to crash. However, even after
> > disabling, region server logs still show compactions going on, as well as
> > aborted compactions. We also get compaction queue size warnings in
> > Manager.
> > Why is this the case?
> > To be fair, we only disabled automated compactions AFTER the import
> > for the first time (yes, HBase was restarted) so maybe there are some
> > trailing compactions, but the queue size keeps increasing which I guess
> > should not be the case. Then again, I don't know how aborted compactions
> > are counted - i.e not sure whether or not to trust the metrics on this.
> > A bit more about what I am trying to accomplish:
> > I am bulk loading about 100 indexed .lzo files with 20 * 10^6 Key-Value
> > (0.5kb) each into an HBase table. Each file is loaded by a separate
> > job, several of these jobs run in parallel to make sure all task trackers
> > are used. Key distribution is the same in each file so even region growth
> > is to be expected. We did not pre-split the table as it does not seem to
> > have been a limiting factor earlier.
> > On a related note. What if any experience do other HBase/Cloudera users
> > have with the Snapshotting feature detailed below?
> > http://www.cloudera.com/content/cloudera-content/cloudera-
> > docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_12.html
> > We need of a robust way to do inter-cluster cloning/back-up of tables,
> > preferably without taking the source table offline or impacting
> > of the source cluster. We only use HDFS files for importing because the
> > CopyTable job needs to run on the source cluster and cannot be resumed
> > it fails.
> > Thanks,
> > /David