Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # dev - [DISCUSS] Remove append?


+
Eli Collins 2012-03-21, 00:37
+
Dave Shine 2012-03-21, 12:36
+
Konstantin Shvachko 2012-03-22, 08:11
+
Tim Broberg 2012-03-21, 18:31
+
Eli Collins 2012-03-21, 18:52
+
Dave Shine 2012-03-21, 19:07
+
Milind.Bhandarkar@... 2012-03-21, 17:17
+
Eli Collins 2012-03-21, 17:32
+
Eli Collins 2012-03-21, 17:33
+
Milind.Bhandarkar@... 2012-03-21, 17:47
+
Eli Collins 2012-03-21, 18:27
+
Milind.Bhandarkar@... 2012-03-21, 19:30
+
Eli Collins 2012-03-21, 21:14
+
Milind.Bhandarkar@... 2012-03-21, 19:48
+
Eli Collins 2012-03-21, 21:09
+
Milind.Bhandarkar@... 2012-03-21, 22:06
+
Eli Collins 2012-03-21, 22:16
+
Milind.Bhandarkar@... 2012-03-21, 22:48
+
Eli Collins 2012-03-21, 23:30
+
Milind.Bhandarkar@... 2012-03-21, 20:24
+
Konstantin Shvachko 2012-03-22, 08:26
+
Eli Collins 2012-03-22, 17:25
+
Scott Carey 2012-03-24, 02:44
+
Colin McCabe 2012-03-26, 19:53
+
Scott Carey 2012-03-26, 20:53
Copy link to this message
-
Re: [DISCUSS] Remove append?
Sanjay Radia 2012-03-21, 20:57
On Tue, Mar 20, 2012 at 5:37 PM, Eli Collins <[EMAIL PROTECTED]> wrote:

>
>
> Append introduces non-trivial design and code complexity, which is not
> worth the cost if we don't have real users.

The bulk of the complexity of HDFS-265 ("the new Append") was around
Hflush, concurrent readers, the pipeline etc. The code and complexity  for
appending to previously closed file was not that large.

> Removing append means we
> have the property that HDFS blocks, when finalized, are immutable.
> This significantly simplifies the design and code, which significantly
> simplifies the implementation of other features like snapshots,
> HDFS-level caching, dedupe, etc.
>

While Snapshots  are challenging with Append, it is solvable - the snapshot
needs to remember the length of the file. (We have a working prototype - we
will posting the design and the code soon).
I agree that the notion of an immutable file is useful since it lets the
system and tools optimize certain things.  A xerox-parc file system in the
80s had this feature that the system exploited. I would support adding the
notion of an immutable file to Hadoop.
sanjay
+
Eli Collins 2012-03-21, 21:08
+
Tsz Wo Sze 2012-03-21, 20:31
+
Eli Collins 2012-03-21, 20:58
+
Daryn Sharp 2012-03-22, 17:15
+
Eli Collins 2012-03-22, 17:47
+
Scott Carey 2012-03-24, 02:26
+
Milind.Bhandarkar@... 2012-03-22, 23:27
+
Eli Collins 2012-03-23, 00:41
+
Scott Carey 2012-03-24, 02:46
+
Tsz Wo Sze 2012-03-23, 00:03
+
Eli Collins 2012-03-23, 00:49
+
Colin McCabe 2012-03-26, 20:02
+
Tsz Wo Sze 2012-03-26, 20:55
+
Colin McCabe 2012-03-26, 21:31
+
Tsz Wo Sze 2012-03-27, 02:46
+
Dhruba Borthakur 2012-03-23, 01:18
+
Daryn Sharp 2012-03-23, 17:03
+
CHANG Lei 2012-03-23, 06:22