-how to control (or understand) the memory usage in hdfs
Ted 2013-03-23, 04:33
Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
machines in a single node setup. I'm encountering out of memory errors
on the jvm running my data node.
I'm pretty sure I can just increase the heap size to fix the errors,
but my question is about how memory is actually used.
As an example, with other things like an OS's disk-cache or say
databases, if you have or let it use as an example 1gb of ram, it will
"work" with what it has available, if the data is more than 1gb of ram
it just means it'll swap in and out of memory/disk more often, i.e.
the cached data is smaller. If you give it 8gb of ram it still
functions the same, just performance increases.
With my hdfs setup, this does not appear to be true, if I allocate it
1gb of heap, it doesn't just perform worst / swap data to disk more.
It out right fails with out of memory and shuts the data node down.
So my question is... how do I really tune the memory / decide how much
memory I need to prevent shutdowns? Is 1gb just too small even on a
single machine test environment with almost no data at all, or is it
suppose to work like OS-disk caches were it always works but just
performs better or worst and I just have something configured wrong?.
Basically my objective isn't performance, it's that the server must
not shut itself down, it can slow down but not shut off.