Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> [DISCUSS] Remove append?


Copy link to this message
-
Re: [DISCUSS] Remove append?
On Tue, Mar 20, 2012 at 5:37 PM, Eli Collins <[EMAIL PROTECTED]> wrote:

>
>
> Append introduces non-trivial design and code complexity, which is not
> worth the cost if we don't have real users.

The bulk of the complexity of HDFS-265 ("the new Append") was around
Hflush, concurrent readers, the pipeline etc. The code and complexity  for
appending to previously closed file was not that large.

> Removing append means we
> have the property that HDFS blocks, when finalized, are immutable.
> This significantly simplifies the design and code, which significantly
> simplifies the implementation of other features like snapshots,
> HDFS-level caching, dedupe, etc.
>

While Snapshots  are challenging with Append, it is solvable - the snapshot
needs to remember the length of the file. (We have a working prototype - we
will posting the design and the code soon).
I agree that the notion of an immutable file is useful since it lets the
system and tools optimize certain things.  A xerox-parc file system in the
80s had this feature that the system exploited. I would support adding the
notion of an immutable file to Hadoop.
sanjay
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB