Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Memory mapped resources


Copy link to this message
-
Re: Memory mapped resources
Luke Lu 2011-04-12, 19:50
You can use distributed cache for memory mapped files (they're local
to the node the tasks run on.)

http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata

On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies
<[EMAIL PROTECTED]> wrote:
> Here's the OP again.
>
> I want to make it clear that my question here has to do with the
> problem of distributing 'the program' around the cluster, not 'the
> data'. In the case at hand, the issue a system that has a large data
> resource that it needs to do its work. Every instance of the code
> needs the entire model. Not just some blocks or pieces.
>
> Memory mapping is a very attractive tactic for this kind of data
> resource. The data is read-only. Memory-mapping it allows the
> operating system to ensure that only one copy of the thing ends up in
> physical memory.
>
> If we force the model into a conventional file (storable in HDFS) and
> read it into the JVM in a conventional way, then we get as many copies
> in memory as we have JVMs.  On a big machine with a lot of cores, this
> begins to add up.
>
> For people who are running a cluster of relatively conventional
> systems, just putting copies on all the nodes in a conventional place
> is adequate.
>