Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Is "heap size allocation" of namenode dynamic or static?


Copy link to this message
-
Re: Is "heap size allocation" of namenode dynamic or static?
Hemanth,

Thank you for the elaborate explanation.
First of all, The total swap memory size is over 4 giga bytes, but the
actual used size around several hundred kilo bytes.
So I guess I can use almost whole 4 giga bytes of physical memory.

The sentence "streaming does not allow enough memory for user's processes to
run" is from the book I study. So I can't say that I exactly understand the
sentence.

Since streaming jobs are not memory intensive I guess I'll start by using
the "-Xms" option.
And maybe dial down the heap size of datanode and tasktracker a little bit.
I'd love it if I could put some more rams into the system but currently that
is out of option so I'll have to do with tweaks and options :)
Thanks again for the answer.

Ed
2010/7/9 Hemanth Yamijala <[EMAIL PROTECTED]>

> Edward,
>
> Overall, I think the consideration should be about how much load do
> you expect to support on your cluster. For HDFS, there's a good amount
> of information about how much RAM is required to support a certain
> amount of data stored in DFS; something similar can be found for
> Map/Reduce as well. There are also a few configuration options to let
> the Jobtracker use lesser memory. I suppose that depending on your
> load, your answer could really have to be "increase the RAM
> configuration" rather than any tweaks of the JVM heap sizes or any
> other configuration. Please do consider that first.
>
> Anyway, some answers to your questions inline:
>
> > Machines in my cluster have relatively small physical memory (4GB)
> >
>
> How much is the swap ? While it is available for use as well, it is
> not advisable, because once the JVM starts to thrash to disk, in our
> experience, it degrades performance rapidly.
>
> > I was wondering if I could reduce the heap size that namenode and
> jobtracker
> > are assigned.
> > The default heap size is 1000MB respectively, and I know that.
> > The thing is, does that 1000MB mean maximum possible memory that
> namenode(or
> > jobtracker) can use?
> > What I mean is that does namenode start with minimum memory and increase
> the
> > memory size all the way up to 1000MB depending on the job status?
> > Or is namenode given 1000MB from the beginning so that there is no
> > flexibility at all?
>
> If you want you can control this using another parameter -Xms set to
> the JVM. This specifies the VM to start with the specified heap size
> and then increase.
>
> > If namenode and jobtracker do start with solid 1000MB then I would have
> to
> > dial them down to several hundreds of mega byte since I only 4GB of
> memory.
> > 2giga bytes of memory taken up just by namenode and jobtracker is too
> much
> > an expense for me.
> >
> > My question also applies to heap size of child JVM. I know that they are
> > originally given 200MB of heap size.
> > I intend to increase the heap size to 512MB, but if the heap size
> allocation
> > has no flexibility then I'd have to maintain the 200MB configuration.
> > Take out the 2GB (used by namenode and jobtracker) from the total 4GB, I
> can
> > have only 4 map/reduce tasks with 512MB configuration and since I have
> quad
> > core CPU this would be a waste.
> >
>
> Please also take into account datanodes/tasktrackers and the OS itself.
>
> > Oh, and one last thing.
> > I am using Hadoop streaming.
> > I read from a book that when you are using hadoop streaming, you should
> > allocate less heap size to child JVM. (I am not sure if it meant less
> than
> > 200MB or less than 400MB)
> > Because streaming does not allow enough memory for user's processes to
> run.
> > So what is the optimal heap size for map/reduce tasks in hadoop
> streaming?
> > My plan was to increase the heap size of the child JVM to 512MB.
> > But if what the book says is true, there is no point.
> >
>
> I think the intent is to say that when you are using Streaming, the
> Child task is not really memory intensive as all the work is going to
> be done by the streaming executable and so you can experiment with
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB