2g really should be enough, it's a bit concerning. How many nodes are you
dealing with, as that could be a factor. And which version of hadoop are
On Tue, Mar 12, 2013 at 1:35 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:
> I'm setting up accumulo on a small cluster where each node has 96GB of ram
> and 24 cores. Any recommendations on what memory settings to use for the
> accumulo processes, as well as what to use for the hadoop processes (e.g.
> datanode, etc)?
> I did a small test just to try some things standalone on a single node,
> setting the accumulo processes to 2GB of ram and the HADOOP_HEAPSIZE=2000.
> While running a map reduce job with 4 workers (each allocated 1GB of RAM),
> the datanode runs out of memory about 25% of the way into the job and dies.
> The job is basically building an index, iterating over data in one table
> and applying mutations to another - nothing too fancy.
> Since I'm dealing with a subset of data, I set the table split threshold
> to 128M for testing purposes, there are currently about 170 tablets so we
> not dealing with a ton of data here. Might this low split threshold be a
> contributing factor?
> Should I increase the HADDOP_HEAPSIZE even further? Or will that just
> delay the inevitable OOM error?
> The exception we are seeing is below.
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(...):DataXceiveServer: Exiting due
> to:java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Thanks for your help!