Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> [DISCUSS] Remove append?


Copy link to this message
-
Re: [DISCUSS] Remove append?
On Mar 20, 2012, at 7:37 PM, Eli Collins wrote:
> Hey gang,
>
> I'd like to get people's thoughts on the following proposal. I think
> we should consider removing append from HDFS.
>
> Where we are today.. append was added in the 0.17-19 releases
> (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality
> issues. It and sync were re-designed, re-implemented, and shipped in
> 21.0 (HDFS-265). To my knowledge, there has been no real production
> use. Anecdotally people who worked on branch-20-append have told me
> they think the new trunk code is substantially less well-tested than
> the branch-20-append code (at least for sync, append was never well
> tested). It has certainly gotten way less pounding from HBase users.
> The design however, is much improved, and people think we can get
> hsync (and append) stabilized in trunk (mostly testing and bug
> fixing).

Up front:  I think append is a needed feature.

Politely speaking, I think the premise of the question is a bit dubious due to circular nature.  Ie. It's not used in production so is it worth it?  The stigma/perception that append has been unstable and is not well-tested is a compelling reason to not be in production at major installations.  The situation is going to be akin to "You go first. No, you go first!  No way, you go first!".

Downstream projects also aren't going to use something until it's stable, so they either work around the limitation, or...  they chose something other hdfs.  There's also the unanswerable question of how potential users have been silently lost.  We are unlikely to have heard the user demand from those that chose another solution.  Generally for every complaint/request, a large N-many people didn't even bother.

I envision a day where hdfs is a performant posix filesystem.  Dropping append sets us back from that goal.  Admittedly, I don't know all the intricacies of how append was implemented and why it is/was difficult.  Is the complexity maybe due to "bolting" append onto code that wasn't designed with mutability in mind?  (That's truly a question, not a statement) If so, perhaps a refactoring would simplify the code?

Dropping append also might be used as a cudgel against hdfs.  Cynically speaking, do we want to risk marketeers from certain competitors to say or imply:  Trust your data with us because we're so brilliant that we have a feature hdfs has repeatedly tried and failed to implement!

Daryn
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB