Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Memory mapped resources

Copy link to this message
Re: Memory mapped resources
Ted Dunning 2011-04-12, 19:05
Actually, it doesn't become trivial.  It just becomes total fail or total
win instead of almost always being partial win.  It doesn't meet Benson's

On Tue, Apr 12, 2011 at 11:09 AM, Jason Rutherglen <

> To get around the chunks or blocks problem, I've been implementing a
> system that simply sets a max block size that is too large for a file
> to reach.  In this way there will only be one block for HDFS file, and
> so MMap'ing or other single file ops become trivial.
> On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies
> <[EMAIL PROTECTED]> wrote:
> > Here's the OP again.
> >
> > I want to make it clear that my question here has to do with the
> > problem of distributing 'the program' around the cluster, not 'the
> > data'. In the case at hand, the issue a system that has a large data
> > resource that it needs to do its work. Every instance of the code
> > needs the entire model. Not just some blocks or pieces.
> >
> > Memory mapping is a very attractive tactic for this kind of data
> > resource. The data is read-only. Memory-mapping it allows the
> > operating system to ensure that only one copy of the thing ends up in
> > physical memory.
> >
> > If we force the model into a conventional file (storable in HDFS) and
> > read it into the JVM in a conventional way, then we get as many copies
> > in memory as we have JVMs.  On a big machine with a lot of cores, this
> > begins to add up.
> >
> > For people who are running a cluster of relatively conventional
> > systems, just putting copies on all the nodes in a conventional place
> > is adequate.
> >