Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> how to control (or understand) the memory usage in hdfs

Ted 2013-03-23, 04:33
Copy link to this message
Re: how to control (or understand) the memory usage in hdfs
I run a 128 MB heap size DN for my simple purposes on my Mac and it
runs well for what load I apply on it.

A DN's primary, growing memory consumption comes from the # of blocks
it carries. All of these blocks' file paths are mapped and kept in the
RAM during its lifetime. If your DN has acquired a lot of blocks by
now, like say close to a million or more, then 1 GB may not suffice
anymore to hold them in and you'd need to scale up (add more RAM or
increase heap size if you have more RAM)/scale out (add another node
and run the balancer).

On Sat, Mar 23, 2013 at 10:03 AM, Ted <[EMAIL PROTECTED]> wrote:
> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
> machines in a single node setup. I'm encountering out of memory errors
> on the jvm running my data node.
> I'm pretty sure I can just increase the heap size to fix the errors,
> but my question is about how memory is actually used.
> As an example, with other things like an OS's disk-cache or say
> databases, if you have or let it use as an example 1gb of ram, it will
> "work" with what it has available, if the data is more than 1gb of ram
> it just means it'll swap in and out of memory/disk more often, i.e.
> the cached data is smaller. If you give it 8gb of ram it still
> functions the same, just performance increases.
> With my hdfs setup, this does not appear to be true, if I allocate it
> 1gb of heap, it doesn't just perform worst / swap data to disk more.
> It out right fails with out of memory and shuts the data node down.
> So my question is... how do I really tune the memory / decide how much
> memory I need to prevent shutdowns? Is 1gb just too small even on a
> single machine test environment with almost no data at all, or is it
> suppose to work like OS-disk caches were it always works but just
> performs better or worst and I just have something configured wrong?.
> Basically my objective isn't performance, it's that the server must
> not shut itself down, it can slow down but not shut off.
> --
> Ted.

Harsh J
Ted 2013-03-23, 09:00
Harsh J 2013-03-23, 10:35
Ted 2013-03-24, 08:19