Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> [DISCUSS] Remove append?


Copy link to this message
-
Re: [DISCUSS] Remove append?
On Mon, Mar 26, 2012 at 1:55 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote:
>> Just one comment: If we do decide to keep append in, we should get it
>> to be actually stable and usable.  In my opinion, this should
>> definitely happen before adding any new operations.
>
> @Colin, append is currently stable and, of course, usable.  Many people in different organizations have tested it
> in small and large scale.  However, it is not yet in a stable release and so it is not yet heavy used.

The append unit test failed on me recently on Jenkins.  It's possible
that this was due to a Jenkins timeout, or something, but I assumed it
was due to instability at the time.  If it happens again, I'll be sure
to check the backtrace and file a JIRA if needed.

>> I agree that the notion of an immutable file is useful since it lets the
>> system and tools optimize certain things.  A xerox-parc file system in the
>> 80s had this feature that the system exploited. I would support adding the
>> notion of an immutable file to Hadoop.

I think Eli was hoping that making files immutable would make the
system simpler, and hopefully, less buggy.  You won't get that benefit
if only certain files are immutable.  In fact, quite the contrary--
you'll just be adding more complexity.

I'd also like to see what the "certain things" are that having certain
files, but not others, be immutable would allow you to optimize.  The
thread you linked to from the JIRA has no information on this.

I am aware of at least two "filesystems" (in the loose sense of the
word) that have immutable files.  One is Venti from Plan9, and the
other is git, by Linus Torvalds.  Both of them are significantly
simpler because of their invariant that files cannot change.  However,
both of them are append-only, meaning that files can never be deleted.
 This seems unsuitable for the HDFS use case, and in fact, I see no
reason to believe that having some, but not all, files be immutable
would provide any benefit.

Feel free to prove me wrong if you think of something, though!

cheers,
Colin
>
> @Sanjay, I filed HDFS-3154.
>
> @Eli and others, it turns out that the discussion is very useful!  Thanks.
>
> Nicholas
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB