Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - Hadoop 0.20.0

Copy link to this message
RE: Hadoop 0.20.0
Jonathan Gray 2009-02-25, 20:29

This is a huge deal for me as an HBase user.  And as someone who does
significant consulting and evangelism for HBase, there is probably not a
more necessary feature at this stage of the project.

Writes in HBase are done in memory, and we also write each edit to the HLog
which is periodically pushed to HDFS to prevent data loss should the node go
down in a way that does not allow us to flush everything properly.  Without
appends, we have to create a new file each time we "checkpoint".  In
practice, this was every 10,000 edits.

With appends, we only need to sync()/flush() to the same file.  In practice,
the default number of edits between flushes became 200, and for those who
require absolutely no data loss it can now be set to 1 without writing a new
hdfs file each time, meaning slow but tolerable performance.

The upcoming 0.20.0 release of HBase is going to bring with it massive
performance boosts and should put us at or above the performance of our
competitors.  Data loss is killer and is going to be enough to deter new and
existing users; at that point, we're no longer a fault-tolerant system.

I typically do not vote strongly for something if I'm not able to contribute
myself, but my impression from our testing over in HBase is that things are
not far off from working satisfactorily for our use case.  If it's only
partially working, I'd much prefer it be slapped with warnings about not
being fully-functional than pulling what does work out from the next

Jonathan Gray

> -----Original Message-----
> From: Nigel Daley [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, February 24, 2009 4:02 PM
> Subject: Hadoop 0.20.0
> Folks,
> Hadoop 0.19.1 is now available with the file append feature disabled.
> It's time to talk about a Hadoop 0.20.0 release.
> Hadoop 0.20.0 feature freeze date was almost 3 months ago.  The last
> few blockers are now almost fixed (should be next week) except for
> HADOOP-4379.  HADOOP-4379 is work that is needed to properly implement
> file append.
> *** I propose we move HADOOP-4379 off to release 0.21.0 and apply the
> same disabling of file append in Hadoop 0.20.0 that we put in place to
> get 0.19.1 released (HADOOP-5224 and HADOOP-5225).
> I will call a vote for 0.20.0 when blockers are fixed.
> Cheers,
> Nigel
> > Folks,
> >
> > Some Hadoop deployments have upgraded to 0.19.0.  Clearly, the 0.19
> > branch has issues and a 0.19.1 release is needed.
> >
> > Quality issues in the changes made for the file append feature have
> > prevented some from deploying Hadoop 0.19.  One of these changes
> > (sync) has now been "fixed" by reducing its semantics in Hadoop
> > 0.18.3 (HADOOP-4997).  This was necessary to stabilize the 0.18
> > branch.
> >
> > I would like to propose that we apply this same "fix" to sync in
> > 0.19.1 and 0.20.0.  Since append requires the full semantics of
> > sync, I propose we also disable append (perhaps throw
> > UnsupportedOperationException from API?).  Yes, this would
> > unfortunately be an incompatible change between 0.19.0 and 0.19.1.
> > We can then take the time needed to fix append properly in 0.21.0.
> >
> > I will call a vote for 0.19.1 and 0.20.0 when blockers are fixed.
> >
> > Nigel