Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - how to control (or understand) the memory usage in hdfs


Copy link to this message
-
Re: how to control (or understand) the memory usage in hdfs
Harsh J 2013-03-23, 10:35
I'm guessing your OutOfMemory then is due to "Unable to create native
thread" message? Do you mind sharing your error logs with us? Cause if
its that, then its a ulimit/system limits issue and not a real memory
issue.

On Sat, Mar 23, 2013 at 2:30 PM, Ted <[EMAIL PROTECTED]> wrote:
> I just checked and after running my tests, I generate only 670mb of
> data, on 89 blocks.
>
> What's more, when I ran the test this time, I had increased my memory
> to 2048mb so it completed fine - but I decided to run jconsole through
> the test so I could see what's happenning. The data node never
> exceeded 200mb of memory usage. It mostly stayed under 100mb.
>
> I'm not sure why it would complain about out of memory and shut itself
> down when it was only 1024. It was fairly consistently doing that the
> last few days including this morning right before I switched it to
> 2048.
>
> I'm going to run the test again with 1024mb and jconsole running, none
> of this makes any sense to me.
>
> On 3/23/13, Harsh J <[EMAIL PROTECTED]> wrote:
>> I run a 128 MB heap size DN for my simple purposes on my Mac and it
>> runs well for what load I apply on it.
>>
>> A DN's primary, growing memory consumption comes from the # of blocks
>> it carries. All of these blocks' file paths are mapped and kept in the
>> RAM during its lifetime. If your DN has acquired a lot of blocks by
>> now, like say close to a million or more, then 1 GB may not suffice
>> anymore to hold them in and you'd need to scale up (add more RAM or
>> increase heap size if you have more RAM)/scale out (add another node
>> and run the balancer).
>>
>> On Sat, Mar 23, 2013 at 10:03 AM, Ted <[EMAIL PROTECTED]> wrote:
>>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>>> machines in a single node setup. I'm encountering out of memory errors
>>> on the jvm running my data node.
>>>
>>> I'm pretty sure I can just increase the heap size to fix the errors,
>>> but my question is about how memory is actually used.
>>>
>>> As an example, with other things like an OS's disk-cache or say
>>> databases, if you have or let it use as an example 1gb of ram, it will
>>> "work" with what it has available, if the data is more than 1gb of ram
>>> it just means it'll swap in and out of memory/disk more often, i.e.
>>> the cached data is smaller. If you give it 8gb of ram it still
>>> functions the same, just performance increases.
>>>
>>> With my hdfs setup, this does not appear to be true, if I allocate it
>>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>>> It out right fails with out of memory and shuts the data node down.
>>>
>>> So my question is... how do I really tune the memory / decide how much
>>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>>> single machine test environment with almost no data at all, or is it
>>> suppose to work like OS-disk caches were it always works but just
>>> performs better or worst and I just have something configured wrong?.
>>> Basically my objective isn't performance, it's that the server must
>>> not shut itself down, it can slow down but not shut off.
>>>
>>> --
>>> Ted.
>>
>>
>>
>> --
>> Harsh J
>>
>
>
> --
> Ted.

--
Harsh J