Machines in my cluster have relatively small physical memory (4GB)
I was wondering if I could reduce the heap size that namenode and jobtracker
The default heap size is 1000MB respectively, and I know that.
The thing is, does that 1000MB mean maximum possible memory that namenode(or
jobtracker) can use?
What I mean is that does namenode start with minimum memory and increase the
memory size all the way up to 1000MB depending on the job status?
Or is namenode given 1000MB from the beginning so that there is no
flexibility at all?
If namenode and jobtracker do start with solid 1000MB then I would have to
dial them down to several hundreds of mega byte since I only 4GB of memory.
2giga bytes of memory taken up just by namenode and jobtracker is too much
an expense for me.
My question also applies to heap size of child JVM. I know that they are
originally given 200MB of heap size.
I intend to increase the heap size to 512MB, but if the heap size allocation
has no flexibility then I'd have to maintain the 200MB configuration.
Take out the 2GB (used by namenode and jobtracker) from the total 4GB, I can
have only 4 map/reduce tasks with 512MB configuration and since I have quad
core CPU this would be a waste.
Oh, and one last thing.
I am using Hadoop streaming.
I read from a book that when you are using hadoop streaming, you should
allocate less heap size to child JVM. (I am not sure if it meant less than
200MB or less than 400MB)
Because streaming does not allow enough memory for user's processes to run.
So what is the optimal heap size for map/reduce tasks in hadoop streaming?
My plan was to increase the heap size of the child JVM to 512MB.
But if what the book says is true, there is no point.