Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> how to control (or understand) the memory usage in hdfs

Ted 2013-03-23, 04:33
Harsh J 2013-03-23, 05:14
Ted 2013-03-23, 09:00
Copy link to this message
Re: how to control (or understand) the memory usage in hdfs
I'm guessing your OutOfMemory then is due to "Unable to create native
thread" message? Do you mind sharing your error logs with us? Cause if
its that, then its a ulimit/system limits issue and not a real memory

On Sat, Mar 23, 2013 at 2:30 PM, Ted <[EMAIL PROTECTED]> wrote:
> I just checked and after running my tests, I generate only 670mb of
> data, on 89 blocks.
> What's more, when I ran the test this time, I had increased my memory
> to 2048mb so it completed fine - but I decided to run jconsole through
> the test so I could see what's happenning. The data node never
> exceeded 200mb of memory usage. It mostly stayed under 100mb.
> I'm not sure why it would complain about out of memory and shut itself
> down when it was only 1024. It was fairly consistently doing that the
> last few days including this morning right before I switched it to
> 2048.
> I'm going to run the test again with 1024mb and jconsole running, none
> of this makes any sense to me.
> On 3/23/13, Harsh J <[EMAIL PROTECTED]> wrote:
>> I run a 128 MB heap size DN for my simple purposes on my Mac and it
>> runs well for what load I apply on it.
>> A DN's primary, growing memory consumption comes from the # of blocks
>> it carries. All of these blocks' file paths are mapped and kept in the
>> RAM during its lifetime. If your DN has acquired a lot of blocks by
>> now, like say close to a million or more, then 1 GB may not suffice
>> anymore to hold them in and you'd need to scale up (add more RAM or
>> increase heap size if you have more RAM)/scale out (add another node
>> and run the balancer).
>> On Sat, Mar 23, 2013 at 10:03 AM, Ted <[EMAIL PROTECTED]> wrote:
>>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>>> machines in a single node setup. I'm encountering out of memory errors
>>> on the jvm running my data node.
>>> I'm pretty sure I can just increase the heap size to fix the errors,
>>> but my question is about how memory is actually used.
>>> As an example, with other things like an OS's disk-cache or say
>>> databases, if you have or let it use as an example 1gb of ram, it will
>>> "work" with what it has available, if the data is more than 1gb of ram
>>> it just means it'll swap in and out of memory/disk more often, i.e.
>>> the cached data is smaller. If you give it 8gb of ram it still
>>> functions the same, just performance increases.
>>> With my hdfs setup, this does not appear to be true, if I allocate it
>>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>>> It out right fails with out of memory and shuts the data node down.
>>> So my question is... how do I really tune the memory / decide how much
>>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>>> single machine test environment with almost no data at all, or is it
>>> suppose to work like OS-disk caches were it always works but just
>>> performs better or worst and I just have something configured wrong?.
>>> Basically my objective isn't performance, it's that the server must
>>> not shut itself down, it can slow down but not shut off.
>>> --
>>> Ted.
>> --
>> Harsh J
> --
> Ted.

Harsh J
Ted 2013-03-24, 08:19