Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Memory mapped resources


Copy link to this message
-
Re: Memory mapped resources
On April 12, 2011 21:50:07 Luke Lu wrote:
> You can use distributed cache for memory mapped files (they're local
> to the node the tasks run on.)
>
> http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata

We adopted this solution for a similar problem.  For a program we developed
each mapper needed to access (read-only) an index about 4 GB in size.  We
distributed the index to each node with the distributed cache, and then
accessed it with mmap.  The 4 GB are loaded into memory, but shared by all the
map tasks on the same node.  The mapper is written in C, so we can call mmap
directly.  In Java you may be able to get the same effect with
java.nio.channels.FileChannel.

Luca
> On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies
>
> <[EMAIL PROTECTED]> wrote:
> > Here's the OP again.
> >
> > I want to make it clear that my question here has to do with the
> > problem of distributing 'the program' around the cluster, not 'the
> > data'. In the case at hand, the issue a system that has a large data
> > resource that it needs to do its work. Every instance of the code
> > needs the entire model. Not just some blocks or pieces.
> >
> > Memory mapping is a very attractive tactic for this kind of data
> > resource. The data is read-only. Memory-mapping it allows the
> > operating system to ensure that only one copy of the thing ends up in
> > physical memory.
> >
> > If we force the model into a conventional file (storable in HDFS) and
> > read it into the JVM in a conventional way, then we get as many copies
> > in memory as we have JVMs.  On a big machine with a lot of cores, this
> > begins to add up.
> >
> > For people who are running a cluster of relatively conventional
> > systems, just putting copies on all the nodes in a conventional place
> > is adequate.

--
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel:  +39 0709250452
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB