Thank you for the elaborate explanation.
First of all, The total swap memory size is over 4 giga bytes, but the
actual used size around several hundred kilo bytes.
So I guess I can use almost whole 4 giga bytes of physical memory.
The sentence "streaming does not allow enough memory for user's processes to
run" is from the book I study. So I can't say that I exactly understand the
Since streaming jobs are not memory intensive I guess I'll start by using
the "-Xms" option.
And maybe dial down the heap size of datanode and tasktracker a little bit.
I'd love it if I could put some more rams into the system but currently that
is out of option so I'll have to do with tweaks and options :)
Thanks again for the answer.
2010/7/9 Hemanth Yamijala <[EMAIL PROTECTED]>
> Overall, I think the consideration should be about how much load do
> you expect to support on your cluster. For HDFS, there's a good amount
> of information about how much RAM is required to support a certain
> amount of data stored in DFS; something similar can be found for
> Map/Reduce as well. There are also a few configuration options to let
> the Jobtracker use lesser memory. I suppose that depending on your
> load, your answer could really have to be "increase the RAM
> configuration" rather than any tweaks of the JVM heap sizes or any
> other configuration. Please do consider that first.
> Anyway, some answers to your questions inline:
> > Machines in my cluster have relatively small physical memory (4GB)
> How much is the swap ? While it is available for use as well, it is
> not advisable, because once the JVM starts to thrash to disk, in our
> experience, it degrades performance rapidly.
> > I was wondering if I could reduce the heap size that namenode and
> > are assigned.
> > The default heap size is 1000MB respectively, and I know that.
> > The thing is, does that 1000MB mean maximum possible memory that
> > jobtracker) can use?
> > What I mean is that does namenode start with minimum memory and increase
> > memory size all the way up to 1000MB depending on the job status?
> > Or is namenode given 1000MB from the beginning so that there is no
> > flexibility at all?
> If you want you can control this using another parameter -Xms set to
> the JVM. This specifies the VM to start with the specified heap size
> and then increase.
> > If namenode and jobtracker do start with solid 1000MB then I would have
> > dial them down to several hundreds of mega byte since I only 4GB of
> > 2giga bytes of memory taken up just by namenode and jobtracker is too
> > an expense for me.
> > My question also applies to heap size of child JVM. I know that they are
> > originally given 200MB of heap size.
> > I intend to increase the heap size to 512MB, but if the heap size
> > has no flexibility then I'd have to maintain the 200MB configuration.
> > Take out the 2GB (used by namenode and jobtracker) from the total 4GB, I
> > have only 4 map/reduce tasks with 512MB configuration and since I have
> > core CPU this would be a waste.
> Please also take into account datanodes/tasktrackers and the OS itself.
> > Oh, and one last thing.
> > I am using Hadoop streaming.
> > I read from a book that when you are using hadoop streaming, you should
> > allocate less heap size to child JVM. (I am not sure if it meant less
> > 200MB or less than 400MB)
> > Because streaming does not allow enough memory for user's processes to
> > So what is the optimal heap size for map/reduce tasks in hadoop
> > My plan was to increase the heap size of the child JVM to 512MB.
> > But if what the book says is true, there is no point.
> I think the intent is to say that when you are using Streaming, the
> Child task is not really memory intensive as all the work is going to
> be done by the streaming executable and so you can experiment with