Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Memory mapped resources


Copy link to this message
-
Re: Memory mapped resources
Kevin,

You present a good discussion of architectural alternatives here.  But my
comment really had more to do with whether a particular HDFS patch would
provide what the original poster seemed to be asking about.  This is
especially pertinent since the patch was intended to scratch a different
itch.

On Tue, Apr 12, 2011 at 5:51 AM, <[EMAIL PROTECTED]> wrote:

> This is the age old argument of what to share in a partitioned
> environment. IBM and Teradata have always used "shared nothing" which is
> what only getting one chunk of the file in each hadoop node is doing.
> Oracle has always used "shared disk" which is not an easy thing to do,
> especially in scale, and seems to have varying results depending on
> application, transaction or dss. Here are a couple of web references.
>
> http://www.informatik.uni-trier.de/~ley/db/conf/vldb/Bhide88.html
>
> http://jhingran.typepad.com/anant_jhingrans_musings/2010/02/shared-nothi
> ng-vs-shared-disks-the-cloud-sequel.html
>
> Rather than say shared nothing isn't useful, hadoop should look to how
> others make this work. The two key problems to avoid are data skew where
> one node sees to much data and becomes the slow node and large
> intra-partition joins where large data is needed from more than one
> partition and potentially gets copied around.
>
> Rather than hybriding into shared disk, I think hadoop should hybrid
> into the shared data solutions others use, replication of select data,
> for solving intra-partition joins in a "shared nothing" architecture.
> This may be more database terminology that could be addressed by hbase,
> but I think it is good background for the questions of memory mapping
> files in hadoop.
>
> Kevin
>
>
> -----Original Message-----
> From: Ted Dunning [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, April 12, 2011 12:09 AM
> To: Jason Rutherglen
> Cc: [EMAIL PROTECTED]; Edward Capriolo
> Subject: Re: Memory mapped resources
>
> Yes.  But only one such block. That is what I meant by chunk.
>
> That is fine if you want that chunk but if you want to mmap the entire
> file,
> it isn't real useful.
>
> On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen <
> [EMAIL PROTECTED]> wrote:
>
> > What do you mean by local chunk?  I think it's providing access to the
> > underlying file block?
> >
> > On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning <[EMAIL PROTECTED]>
> > wrote:
> > > Also, it only provides access to a local chunk of a file which isn't
> very
> > > useful.
> > >
> > > On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo
> <[EMAIL PROTECTED]>
> > > wrote:
> > >>
> > >> On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen
> > >> <[EMAIL PROTECTED]> wrote:
> > >> > Yes you can however it will require customization of HDFS.  Take
> a
> > >> > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt
> patch.
> > >> >  I have been altering it for use with HBASE-3529.  Note that the
> patch
> > >> > noted is for the -append branch which is mainly for HBase.
> > >> >
> > >> > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies
> > >> > <[EMAIL PROTECTED]> wrote:
> > >> >> We have some very large files that we access via memory mapping
> in
> > >> >> Java. Someone's asked us about how to make this conveniently
> > >> >> deployable in Hadoop. If we tell them to put the files into
> hdfs, can
> > >> >> we obtain a File for the underlying file on any given node?
> > >> >>
> > >> >
> > >>
> > >> This features it not yet part of hadoop so doing this is not
> > "convenient".
> > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB