Benson Margulies 2011-04-11, 22:57
Jason Rutherglen 2011-04-11, 23:05
Edward Capriolo 2011-04-12, 00:32
Ted Dunning 2011-04-12, 01:30
Jason Rutherglen 2011-04-12, 01:48
Ted Dunning 2011-04-12, 04:09
Kevin.Leach@... 2011-04-12, 12:51
-Re: Memory mapped resources
Ted Dunning 2011-04-12, 15:07
You present a good discussion of architectural alternatives here. But my
comment really had more to do with whether a particular HDFS patch would
provide what the original poster seemed to be asking about. This is
especially pertinent since the patch was intended to scratch a different
On Tue, Apr 12, 2011 at 5:51 AM, <[EMAIL PROTECTED]> wrote:
> This is the age old argument of what to share in a partitioned
> environment. IBM and Teradata have always used "shared nothing" which is
> what only getting one chunk of the file in each hadoop node is doing.
> Oracle has always used "shared disk" which is not an easy thing to do,
> especially in scale, and seems to have varying results depending on
> application, transaction or dss. Here are a couple of web references.
> Rather than say shared nothing isn't useful, hadoop should look to how
> others make this work. The two key problems to avoid are data skew where
> one node sees to much data and becomes the slow node and large
> intra-partition joins where large data is needed from more than one
> partition and potentially gets copied around.
> Rather than hybriding into shared disk, I think hadoop should hybrid
> into the shared data solutions others use, replication of select data,
> for solving intra-partition joins in a "shared nothing" architecture.
> This may be more database terminology that could be addressed by hbase,
> but I think it is good background for the questions of memory mapping
> files in hadoop.
> -----Original Message-----
> From: Ted Dunning [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, April 12, 2011 12:09 AM
> To: Jason Rutherglen
> Cc: [EMAIL PROTECTED]; Edward Capriolo
> Subject: Re: Memory mapped resources
> Yes. But only one such block. That is what I meant by chunk.
> That is fine if you want that chunk but if you want to mmap the entire
> it isn't real useful.
> On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen <
> [EMAIL PROTECTED]> wrote:
> > What do you mean by local chunk? I think it's providing access to the
> > underlying file block?
> > On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning <[EMAIL PROTECTED]>
> > wrote:
> > > Also, it only provides access to a local chunk of a file which isn't
> > > useful.
> > >
> > > On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo
> <[EMAIL PROTECTED]>
> > > wrote:
> > >>
> > >> On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen
> > >> <[EMAIL PROTECTED]> wrote:
> > >> > Yes you can however it will require customization of HDFS. Take
> > >> > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt
> > >> > I have been altering it for use with HBASE-3529. Note that the
> > >> > noted is for the -append branch which is mainly for HBase.
> > >> >
> > >> > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies
> > >> > <[EMAIL PROTECTED]> wrote:
> > >> >> We have some very large files that we access via memory mapping
> > >> >> Java. Someone's asked us about how to make this conveniently
> > >> >> deployable in Hadoop. If we tell them to put the files into
> hdfs, can
> > >> >> we obtain a File for the underlying file on any given node?
> > >> >>
> > >> >
> > >>
> > >> This features it not yet part of hadoop so doing this is not
> > "convenient".
> > >
> > >
Jason Rutherglen 2011-04-12, 13:32
Ted Dunning 2011-04-12, 15:08
Jason Rutherglen 2011-04-12, 15:24
Ted Dunning 2011-04-12, 15:35
Benson Margulies 2011-04-12, 17:40
Jason Rutherglen 2011-04-12, 18:09
Ted Dunning 2011-04-12, 19:05
Luke Lu 2011-04-12, 19:50
Luca Pireddu 2011-04-13, 07:21
M. C. Srivas 2011-04-13, 02:16
Ted Dunning 2011-04-13, 04:09
Benson Margulies 2011-04-13, 10:54
M. C. Srivas 2011-04-13, 14:33
Benson Margulies 2011-04-13, 14:35
Lance Norskog 2011-04-14, 02:41
Michael Flester 2011-04-12, 14:06