|
|
-
Is "heap size allocation" of namenode dynamic or static?
edward choi 2010-07-09, 02:31
Hi,
Machines in my cluster have relatively small physical memory (4GB)
I was wondering if I could reduce the heap size that namenode and jobtracker are assigned. The default heap size is 1000MB respectively, and I know that. The thing is, does that 1000MB mean maximum possible memory that namenode(or jobtracker) can use? What I mean is that does namenode start with minimum memory and increase the memory size all the way up to 1000MB depending on the job status? Or is namenode given 1000MB from the beginning so that there is no flexibility at all?
If namenode and jobtracker do start with solid 1000MB then I would have to dial them down to several hundreds of mega byte since I only 4GB of memory. 2giga bytes of memory taken up just by namenode and jobtracker is too much an expense for me.
My question also applies to heap size of child JVM. I know that they are originally given 200MB of heap size. I intend to increase the heap size to 512MB, but if the heap size allocation has no flexibility then I'd have to maintain the 200MB configuration. Take out the 2GB (used by namenode and jobtracker) from the total 4GB, I can have only 4 map/reduce tasks with 512MB configuration and since I have quad core CPU this would be a waste.
Oh, and one last thing. I am using Hadoop streaming. I read from a book that when you are using hadoop streaming, you should allocate less heap size to child JVM. (I am not sure if it meant less than 200MB or less than 400MB) Because streaming does not allow enough memory for user's processes to run. So what is the optimal heap size for map/reduce tasks in hadoop streaming? My plan was to increase the heap size of the child JVM to 512MB. But if what the book says is true, there is no point.
-
Re: Is "heap size allocation" of namenode dynamic or static?
Hemanth Yamijala 2010-07-09, 09:03
Edward,
Overall, I think the consideration should be about how much load do you expect to support on your cluster. For HDFS, there's a good amount of information about how much RAM is required to support a certain amount of data stored in DFS; something similar can be found for Map/Reduce as well. There are also a few configuration options to let the Jobtracker use lesser memory. I suppose that depending on your load, your answer could really have to be "increase the RAM configuration" rather than any tweaks of the JVM heap sizes or any other configuration. Please do consider that first.
Anyway, some answers to your questions inline:
> Machines in my cluster have relatively small physical memory (4GB) >
How much is the swap ? While it is available for use as well, it is not advisable, because once the JVM starts to thrash to disk, in our experience, it degrades performance rapidly.
> I was wondering if I could reduce the heap size that namenode and jobtracker > are assigned. > The default heap size is 1000MB respectively, and I know that. > The thing is, does that 1000MB mean maximum possible memory that namenode(or > jobtracker) can use? > What I mean is that does namenode start with minimum memory and increase the > memory size all the way up to 1000MB depending on the job status? > Or is namenode given 1000MB from the beginning so that there is no > flexibility at all?
If you want you can control this using another parameter -Xms set to the JVM. This specifies the VM to start with the specified heap size and then increase.
> If namenode and jobtracker do start with solid 1000MB then I would have to > dial them down to several hundreds of mega byte since I only 4GB of memory. > 2giga bytes of memory taken up just by namenode and jobtracker is too much > an expense for me. > > My question also applies to heap size of child JVM. I know that they are > originally given 200MB of heap size. > I intend to increase the heap size to 512MB, but if the heap size allocation > has no flexibility then I'd have to maintain the 200MB configuration. > Take out the 2GB (used by namenode and jobtracker) from the total 4GB, I can > have only 4 map/reduce tasks with 512MB configuration and since I have quad > core CPU this would be a waste. >
Please also take into account datanodes/tasktrackers and the OS itself.
> Oh, and one last thing. > I am using Hadoop streaming. > I read from a book that when you are using hadoop streaming, you should > allocate less heap size to child JVM. (I am not sure if it meant less than > 200MB or less than 400MB) > Because streaming does not allow enough memory for user's processes to run. > So what is the optimal heap size for map/reduce tasks in hadoop streaming? > My plan was to increase the heap size of the child JVM to 512MB. > But if what the book says is true, there is no point. >
I think the intent is to say that when you are using Streaming, the Child task is not really memory intensive as all the work is going to be done by the streaming executable and so you can experiment with much lower values than if you want to run pure Java M/R tasks. I am not sure what you mean by "streaming does not allow enough memory for user's processes to run".
Thanks hemanth
-
Re: Is "heap size allocation" of namenode dynamic or static?
edward choi 2010-07-09, 14:26
Hemanth,
Thank you for the elaborate explanation. First of all, The total swap memory size is over 4 giga bytes, but the actual used size around several hundred kilo bytes. So I guess I can use almost whole 4 giga bytes of physical memory.
The sentence "streaming does not allow enough memory for user's processes to run" is from the book I study. So I can't say that I exactly understand the sentence.
Since streaming jobs are not memory intensive I guess I'll start by using the "-Xms" option. And maybe dial down the heap size of datanode and tasktracker a little bit. I'd love it if I could put some more rams into the system but currently that is out of option so I'll have to do with tweaks and options :) Thanks again for the answer.
Ed 2010/7/9 Hemanth Yamijala <[EMAIL PROTECTED]>
> Edward, > > Overall, I think the consideration should be about how much load do > you expect to support on your cluster. For HDFS, there's a good amount > of information about how much RAM is required to support a certain > amount of data stored in DFS; something similar can be found for > Map/Reduce as well. There are also a few configuration options to let > the Jobtracker use lesser memory. I suppose that depending on your > load, your answer could really have to be "increase the RAM > configuration" rather than any tweaks of the JVM heap sizes or any > other configuration. Please do consider that first. > > Anyway, some answers to your questions inline: > > > Machines in my cluster have relatively small physical memory (4GB) > > > > How much is the swap ? While it is available for use as well, it is > not advisable, because once the JVM starts to thrash to disk, in our > experience, it degrades performance rapidly. > > > I was wondering if I could reduce the heap size that namenode and > jobtracker > > are assigned. > > The default heap size is 1000MB respectively, and I know that. > > The thing is, does that 1000MB mean maximum possible memory that > namenode(or > > jobtracker) can use? > > What I mean is that does namenode start with minimum memory and increase > the > > memory size all the way up to 1000MB depending on the job status? > > Or is namenode given 1000MB from the beginning so that there is no > > flexibility at all? > > If you want you can control this using another parameter -Xms set to > the JVM. This specifies the VM to start with the specified heap size > and then increase. > > > If namenode and jobtracker do start with solid 1000MB then I would have > to > > dial them down to several hundreds of mega byte since I only 4GB of > memory. > > 2giga bytes of memory taken up just by namenode and jobtracker is too > much > > an expense for me. > > > > My question also applies to heap size of child JVM. I know that they are > > originally given 200MB of heap size. > > I intend to increase the heap size to 512MB, but if the heap size > allocation > > has no flexibility then I'd have to maintain the 200MB configuration. > > Take out the 2GB (used by namenode and jobtracker) from the total 4GB, I > can > > have only 4 map/reduce tasks with 512MB configuration and since I have > quad > > core CPU this would be a waste. > > > > Please also take into account datanodes/tasktrackers and the OS itself. > > > Oh, and one last thing. > > I am using Hadoop streaming. > > I read from a book that when you are using hadoop streaming, you should > > allocate less heap size to child JVM. (I am not sure if it meant less > than > > 200MB or less than 400MB) > > Because streaming does not allow enough memory for user's processes to > run. > > So what is the optimal heap size for map/reduce tasks in hadoop > streaming? > > My plan was to increase the heap size of the child JVM to 512MB. > > But if what the book says is true, there is no point. > > > > I think the intent is to say that when you are using Streaming, the > Child task is not really memory intensive as all the work is going to > be done by the streaming executable and so you can experiment with
|
|