Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Memory mapped resources

Copy link to this message
Re: Memory mapped resources
You can use distributed cache for memory mapped files (they're local
to the node the tasks run on.)


On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies
> Here's the OP again.
> I want to make it clear that my question here has to do with the
> problem of distributing 'the program' around the cluster, not 'the
> data'. In the case at hand, the issue a system that has a large data
> resource that it needs to do its work. Every instance of the code
> needs the entire model. Not just some blocks or pieces.
> Memory mapping is a very attractive tactic for this kind of data
> resource. The data is read-only. Memory-mapping it allows the
> operating system to ensure that only one copy of the thing ends up in
> physical memory.
> If we force the model into a conventional file (storable in HDFS) and
> read it into the JVM in a conventional way, then we get as many copies
> in memory as we have JVMs.  On a big machine with a lot of cores, this
> begins to add up.
> For people who are running a cluster of relatively conventional
> systems, just putting copies on all the nodes in a conventional place
> is adequate.