-Re: append feature in 1.0.X - current stable version
Harsh J 2012-05-15, 04:33
On Tue, May 15, 2012 at 3:55 AM, <[EMAIL PROTECTED]> wrote:
> Hello All,
> Does someone knows if all issues from append functionality have been fixed on Hadoop latest stable version (1.0.X)???
The 0.20-append branch introduced two client-end calls:
1. append() - This is still known to be broken. This allows you to
reopen files and add data in it.
2. sync() - This works reliably well. This allows you to immediately
flush your writer data to DataNodes and allow new readers read it
properly (without having to close the file).
The first is still broken in 1.0 and has some odd bugs that surface
depending on some edge cases. It is highly recommended not to use it.
The second is what HBase/Flume/etc. use,
and works pretty nicely.
> I mean, I had a lot of problems with append on hadoop 0.20.2. I noticed that one of the guarantees of the append function, that readers can read data that has been flushed by the writer, was not working.
Apache Hadoop 0.20.2 had no append or sync features. I am not sure
what you're calling broken. The sync feature from 0.20-append branch
(Which is present in 0.20.205/1.x, CDH3, etc.) works just properly and
hundreds of HBase users out there leverage it indirectly.
> For that reason they created the config parameter dfs.support.append, which is false by default, just to be used on development or test clusters. Is it true by default now on latest version?
It isn't true now either. However, a recent change has spilt apart the
two calls and now sync() is enabled by default, and append() is
disabled by default unless you set dfs.support.broken.append to true
> IF append functionality is not stable yet, does someone knows if there is some estimative to be?
0.23/2.0.0 should have a better implementation of that, but I haven't
tested it out personally. For most of my use cases, sync() (Which in
2.0/0.23 is known as hflush() and hsync()) suffices.