-Re: Issues running a large MapReduce job over a complete HBase table
Gabriel Reid 2010-12-06, 12:30
All of the max heap sizes are left on their default values (ie 1000MB).
The OOMEs that I encountered in the data nodes was only when I put the
dfs.datanode.max.xcievers unrealistically high (8192) in an effort to
escape the "xceiverCount X exceeds the limit of concurrent xcievers"
errors. The datanodes weren't really having hard crashes, but they
were getting OOMEs and becoming unusable until a restart.
On Mon, Dec 6, 2010 at 12:33 PM, Lars George <[EMAIL PROTECTED]> wrote:
> Hi Gabriel,
> What max heap to you give the various daemons? This is really odd that
> you see OOMEs, I would like to know what it has consumed. You are
> saying the Hadoop DataNodes actually crash with the OOME?
> On Mon, Dec 6, 2010 at 9:02 AM, Gabriel Reid <[EMAIL PROTECTED]> wrote:
>> We're currently running into issues with running a MapReduce job over
>> a complete HBase table - we can't seem to find a balance between
>> having dfs.datanode.max.xcievers set too low (and getting
>> "xceiverCount X exceeds the limit of concurrent xcievers") and getting
>> OutOfMemoryErrors on datanodes.
>> When trying to run a MapReduce job on the complete table we inevitably
>> get one of the two above errors eventually -- using a more restrictive
>> Scan with a startRow and stopRow for the job runs without problems.
>> An important note is that the table that is being scanned has a large
>> disparity in the size of the values being stored -- one column family
>> contains values that are all generally around 256 kB in size, while
>> the other column families in the table contain values that are closer
>> to 256 bytes. The hbase.hregion.max.filesize setting is still at the
>> default (256 MB), meaning that we have HFiles for the big column that
>> are around 256 MB, and HFiles for the other columns that are around
>> 256 kB. The dfs.datanode.max.xcievers setting is currently at 2048,
>> and this is running a 5-node cluster.
>> The table in question has about 7 million rows, and we're using
>> Cloudera CDH3 (HBase 0.89.20100924 and Hadoop 0.20.2).
>> As far as I have been able to discover, the correct thing to do (or to
>> have done) is to set the hbase.hregion.max.filesize to a larger value
>> to have a smaller number of rows, which as I understand would probably
>> solve the issue here.
>> My questions are:
>> 1. Is my analysis about having a larger hbase.hregion.max.filesize correct?
>> 2. Is there something else that we can do to resolve this?
>> 3. Am I correct in assuming that the best way to resolve this now is
>> to make the hbase.hregion.max.filesize setting larger, and then use
>> the org.apache.hadoop.hbase.util.Merge tool as discussed at
>> http://osdir.com/ml/general/2010-12/msg00534.html ?
>> Any help on this would be greatly appreciated.