Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Memory intensive jobs and JVM reuse


Copy link to this message
-
Re: Memory intensive jobs and JVM reuse
On 04/29/2010 11:08 AM, Danny Leshem wrote:
> David,
>
> DistributedCache distributes files across the cluster - it is not a shared
> memory cache.
> My problem is not distributing the HashMap across machines, but the fact
> that it is replicated in memory for each task (or each job, for that
> matter).

OK, sorry for the misunderstanding.

Hmmm ... well ... I thought there was a config parm to control whether
the task gets launched in a new VM, but I can't seem to find it.  A
quick look at the list of map/reduce parms turned up this, though:

mapred.job.reuse.jvm.num.tasks default: 1 How many tasks to run per jvm.
If set to -1, there is no limit.

Perhaps that might help?  I'm speculating here, but in theory, if you
set it to -1, then all task attempts per job per node would run in the
same VM ... and so be able to have access to the same static variables.

You might also want to poke around in the full list of map/reduce config
parms and see if there's anything else in there that might help solve this:

http://hadoop.apache.org/common/docs/current/mapred-default.html

HTH,

DR