|
sanjay Radia
2011-08-31, 18:41
Milind.Bhandarkar@...
2011-08-31, 22:07
Todd Lipcon
2011-09-01, 06:43
Dhruba Borthakur
2011-09-01, 18:09
Milind.Bhandarkar@...
2011-09-01, 19:48
Arun C Murthy
2011-09-01, 22:36
Rottinghuis, Joep
2011-09-02, 02:56
Stack
2011-09-02, 04:08
Andrew Purtell
2011-09-02, 05:56
Arun C Murthy
2011-09-02, 06:01
Suresh Srinivas
2011-09-02, 18:20
Todd Lipcon
2011-09-02, 20:03
Matt Foley
2011-09-02, 21:34
Eli Collins
2011-09-02, 21:48
|
-
Add Append-HBase support in upcoming 20.205sanjay Radia 2011-08-31, 18:41
I propose that the 20-append patches (details below) be included in 20.205 which will become the first official Apache release of Hadoop that supports Append and HBase. Background: There hasn't been a official Apache release that supports HBase. The HBase community have instead been using the 20-append branch; the patches were contributed by the HBase community including Facebook. The Cloudera distribution has also included these patches. Andrew Purtell has ported these patches to 20-security branch. Risk Level: These patches have been used and tested on large HBase clusters by FB , by those who use 20-append branch directly (various users including a 500 node HBase cluster at Yahoo) and by those that use the Cloudera distribution. We have reviewed the patches and have conducted further tests; testing and validation continues. Patches: HDFS-200. Support append and sync for hadoop 0.20 branch. HDFS-142. Blocks that are being written by a client are stored in the blocksBeingWritten directory. HDFS-1057. Concurrent readers hit ChecksumExceptions if following a writer to very end of file HDFS-724. Use a bidirectional heartbeat to detect stuck pipeline. HDFS-895. Allow hflush/sync to occur in parallel with new writes to the file. HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease recovery. HDFS-1555. Disallow pipelien recovery if a file is already being lease recovered. HDFS-1554. New semantics for recoverLease. HDFS-988. Fix bug where savenameSpace can corrupt edits log. HDFS-826. Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline. HDFS-630. Client can exclude specific nodes in the write pipeline. HDFS-1141. completeFile does not check lease ownership. HDFS-1204. Lease expiration should recover single files, not entire lease holder HDFS-1254. Support append/sync via the default configuration. HDFS-1346. DFSClient receives out of order packet ack. HDFS-1054. remove sleep before retry for allocating a block.
-
Re: Add Append-HBase support in upcoming 20.205Milind.Bhandarkar@... 2011-08-31, 22:07
FWIW, Stack has already done the work needed to make sure that Hbase works
with Hadoop 0.22 branch, and I suppose if https://issues.apache.org/jira/browse/MAPREDUCE-2767 is committed, it removes the last blocker from 0.22.0, so that it can be released. I am cc'ng hbase-dev, since this is relevant to them as well. - Milind On 8/31/11 11:41 AM, "sanjay Radia" <[EMAIL PROTECTED]> wrote: > >I propose that the 20-append patches (details below) be included in >20.205 which will become the first official Apache >release of Hadoop that supports Append and HBase. > >Background: >There hasn't been a official Apache release that supports HBase. >The HBase community have instead been using the 20-append branch; the >patches were contributed by the HBase community including Facebook. The >Cloudera distribution has also included these patches. >Andrew Purtell has ported these patches to 20-security branch. > >Risk Level: >These patches have been used and tested on large HBase clusters by FB , >by those who use 20-append branch directly (various users including a 500 >node HBase cluster at Yahoo) and by those that use the Cloudera >distribution. We have reviewed the patches and have conducted further >tests; testing and validation continues. > > >Patches: >HDFS-200. Support append and sync for hadoop 0.20 branch. >HDFS-142. Blocks that are being written by a client are stored in the >blocksBeingWritten directory. >HDFS-1057. Concurrent readers hit ChecksumExceptions if following a >writer to very end of file >HDFS-724. Use a bidirectional heartbeat to detect stuck pipeline. >HDFS-895. Allow hflush/sync to occur in parallel with new writes to the >file. >HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease >recovery. >HDFS-1555. Disallow pipelien recovery if a file is already being lease >recovered. >HDFS-1554. New semantics for recoverLease. >HDFS-988. Fix bug where savenameSpace can corrupt edits log. >HDFS-826. Allow a mechanism for an application to detect that datanode(s) >have died in the write pipeline. >HDFS-630. Client can exclude specific nodes in the write pipeline. >HDFS-1141. completeFile does not check lease ownership. >HDFS-1204. Lease expiration should recover single files, not entire lease >holder >HDFS-1254. Support append/sync via the default configuration. >HDFS-1346. DFSClient receives out of order packet ack. >HDFS-1054. remove sleep before retry for allocating a block. > >
-
Re: Add Append-HBase support in upcoming 20.205Todd Lipcon 2011-09-01, 06:43
On Wed, Aug 31, 2011 at 3:07 PM, <[EMAIL PROTECTED]> wrote:
> FWIW, Stack has already done the work needed to make sure that Hbase works > with Hadoop 0.22 branch, and I suppose if > https://issues.apache.org/jira/browse/MAPREDUCE-2767 is committed, it > removes the last blocker from 0.22.0, so that it can be released. The 0.22 implementation "works" but there are certainly still bugs in it. If other HDFS committers familiar with the new append could help here, that would be very much appreciated. For example, https://issues.apache.org/jira/browse/HDFS-2288 can cause HBase to fail to recover its WAL during a crash scenario. There are some others that I'll be likely working through in the coming months. -Todd > > I am cc'ng hbase-dev, since this is relevant to them as well. > > - Milind > > On 8/31/11 11:41 AM, "sanjay Radia" <[EMAIL PROTECTED]> wrote: > >> >>I propose that the 20-append patches (details below) be included in >>20.205 which will become the first official Apache >>release of Hadoop that supports Append and HBase. >> >>Background: >>There hasn't been a official Apache release that supports HBase. >>The HBase community have instead been using the 20-append branch; the >>patches were contributed by the HBase community including Facebook. The >>Cloudera distribution has also included these patches. >>Andrew Purtell has ported these patches to 20-security branch. >> >>Risk Level: >>These patches have been used and tested on large HBase clusters by FB , >>by those who use 20-append branch directly (various users including a 500 >>node HBase cluster at Yahoo) and by those that use the Cloudera >>distribution. We have reviewed the patches and have conducted further >>tests; testing and validation continues. >> >> >>Patches: >>HDFS-200. Support append and sync for hadoop 0.20 branch. >>HDFS-142. Blocks that are being written by a client are stored in the >>blocksBeingWritten directory. >>HDFS-1057. Concurrent readers hit ChecksumExceptions if following a >>writer to very end of file >>HDFS-724. Use a bidirectional heartbeat to detect stuck pipeline. >>HDFS-895. Allow hflush/sync to occur in parallel with new writes to the >>file. >>HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease >>recovery. >>HDFS-1555. Disallow pipelien recovery if a file is already being lease >>recovered. >>HDFS-1554. New semantics for recoverLease. >>HDFS-988. Fix bug where savenameSpace can corrupt edits log. >>HDFS-826. Allow a mechanism for an application to detect that datanode(s) >>have died in the write pipeline. >>HDFS-630. Client can exclude specific nodes in the write pipeline. >>HDFS-1141. completeFile does not check lease ownership. >>HDFS-1204. Lease expiration should recover single files, not entire lease >>holder >>HDFS-1254. Support append/sync via the default configuration. >>HDFS-1346. DFSClient receives out of order packet ack. >>HDFS-1054. remove sleep before retry for allocating a block. >> >> > > -- Todd Lipcon Software Engineer, Cloudera
-
Re: Add Append-HBase support in upcoming 20.205Dhruba Borthakur 2011-09-01, 18:09
This seems like a good effort to allow HBase to run on a "released" Apache
branch. +1. -dhruba On Wed, Aug 31, 2011 at 11:41 AM, sanjay Radia <[EMAIL PROTECTED]>wrote: > > I propose that the 20-append patches (details below) be included in 20.205 > which will become the first official Apache > release of Hadoop that supports Append and HBase. > > Background: > There hasn't been a official Apache release that supports HBase. > The HBase community have instead been using the 20-append branch; the > patches were contributed by the HBase community including Facebook. The > Cloudera distribution has also included these patches. > Andrew Purtell has ported these patches to 20-security branch. > > Risk Level: > These patches have been used and tested on large HBase clusters by FB , by > those who use 20-append branch directly (various users including a 500 node > HBase cluster at Yahoo) and by those that use the Cloudera distribution. We > have reviewed the patches and have conducted further tests; testing and > validation continues. > > > Patches: > HDFS-200. Support append and sync for hadoop 0.20 branch. > HDFS-142. Blocks that are being written by a client are stored in the > blocksBeingWritten directory. > HDFS-1057. Concurrent readers hit ChecksumExceptions if following a writer > to very end of file > HDFS-724. Use a bidirectional heartbeat to detect stuck pipeline. > HDFS-895. Allow hflush/sync to occur in parallel with new writes to the > file. > HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease > recovery. > HDFS-1555. Disallow pipelien recovery if a file is already being lease > recovered. > HDFS-1554. New semantics for recoverLease. > HDFS-988. Fix bug where savenameSpace can corrupt edits log. > HDFS-826. Allow a mechanism for an application to detect that datanode(s) > have died in the write pipeline. > HDFS-630. Client can exclude specific nodes in the write pipeline. > HDFS-1141. completeFile does not check lease ownership. > HDFS-1204. Lease expiration should recover single files, not entire lease > holder > HDFS-1254. Support append/sync via the default configuration. > HDFS-1346. DFSClient receives out of order packet ack. > HDFS-1054. remove sleep before retry for allocating a block. > > -- Connect to me at http://www.facebook.com/dhruba
-
Re: Add Append-HBase support in upcoming 20.205Milind.Bhandarkar@... 2011-09-01, 19:48
>
> > >For example, https://issues.apache.org/jira/browse/HDFS-2288 can cause >HBase to fail to recover its WAL during a crash scenario. There are >some others that I'll be likely working through in the coming months. Thanks Todd. Will go through it to test against 0.22. - milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any organization, past or present, the author might be affiliated with.)
-
Re: Add Append-HBase support in upcoming 20.205Arun C Murthy 2011-09-01, 22:36
On Aug 31, 2011, at 11:41 AM, sanjay Radia wrote: > > I propose that the 20-append patches (details below) be included in 20.205 which will become the first official Apache > release of Hadoop that supports Append and HBase. > > Background: > There hasn't been a official Apache release that supports HBase. > The HBase community have instead been using the 20-append branch; the patches were contributed by the HBase community including Facebook. The Cloudera distribution has also included these patches. > Andrew Purtell has ported these patches to 20-security branch. > +1 I think it's high time (I thought so too last Dec too: http://s.apache.org/jr) we had an official Hadoop release which supports HBase. Thanks for all the effort Andrew - it will be really nice to have an Apache 0.20 with security+append! Arun > Risk Level: > These patches have been used and tested on large HBase clusters by FB , by those who use 20-append branch directly (various users including a 500 node HBase cluster at Yahoo) and by those that use the Cloudera distribution. We have reviewed the patches and have conducted further tests; testing and validation continues. > > > Patches: > HDFS-200. Support append and sync for hadoop 0.20 branch. > HDFS-142. Blocks that are being written by a client are stored in the blocksBeingWritten directory. > HDFS-1057. Concurrent readers hit ChecksumExceptions if following a writer to very end of file > HDFS-724. Use a bidirectional heartbeat to detect stuck pipeline. > HDFS-895. Allow hflush/sync to occur in parallel with new writes to the file. > HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease recovery. > HDFS-1555. Disallow pipelien recovery if a file is already being lease recovered. > HDFS-1554. New semantics for recoverLease. > HDFS-988. Fix bug where savenameSpace can corrupt edits log. > HDFS-826. Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline. > HDFS-630. Client can exclude specific nodes in the write pipeline. > HDFS-1141. completeFile does not check lease ownership. > HDFS-1204. Lease expiration should recover single files, not entire lease holder > HDFS-1254. Support append/sync via the default configuration. > HDFS-1346. DFSClient receives out of order packet ack. > HDFS-1054. remove sleep before retry for allocating a block. >
-
RE: Add Append-HBase support in upcoming 20.205Rottinghuis, Joep 2011-09-02, 02:56
It seems that HBase was made to compile against 0.23 (trunk at the time).
See HBASE-4327. Thanks, Joep -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Wednesday, August 31, 2011 3:07 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Add Append-HBase support in upcoming 20.205 FWIW, Stack has already done the work needed to make sure that Hbase works with Hadoop 0.22 branch, and I suppose if https://issues.apache.org/jira/browse/MAPREDUCE-2767 is committed, it removes the last blocker from 0.22.0, so that it can be released. I am cc'ng hbase-dev, since this is relevant to them as well. - Milind
-
Re: Add Append-HBase support in upcoming 20.205Stack 2011-09-02, 04:08
+1
I'm biased. And if we were adding any other feature but sync/append in a minor release on 0.20 I'd be praising the work but not voting for its inclusion. So I'm a hypocrite too.... but I can't help myself. St.Ack P.S. Below is the hbase projects' 'official' story on the version of hadoop users can run against our current stable offering. Its from our 'manual' up on the hbase website. It might look to you like a mess but the text is actually hard-won after feedback from folks who have had to navigate its intricacies "2.3. Hadoop This version of HBase will only run on Hadoop 0.20.x. It will not run on hadoop 0.21.x (nor 0.22.x). HBase will lose data unless it is running on an HDFS that has a durable sync. Hadoop 0.20.2 and Hadoop 0.20.203.0 DO NOT have this attribute. Currently only the branch-0.20-append branch has this a working sync[5]. No official releases have been made from the branch-0.20-append branch up to now so you will have to build your own Hadoop from the tip of this branch. Michael Noll has written a detailed blog, Building an Hadoop 0.20.x version for HBase 0.90.2, on how to build an Hadoop from branch-0.20-append. Recommended [6]. Or rather than build your own, you could use the Cloudera or MapR distributions.... See http://hbase.apache.org/book.html#hadoop On Wed, Aug 31, 2011 at 11:41 AM, sanjay Radia <[EMAIL PROTECTED]> wrote: > > I propose that the 20-append patches (details below) be included in 20.205 which will become the first official Apache > release of Hadoop that supports Append and HBase. > > Background: > There hasn't been a official Apache release that supports HBase. > The HBase community have instead been using the 20-append branch; the patches were contributed by the HBase community including Facebook. The Cloudera distribution has also included these patches. > Andrew Purtell has ported these patches to 20-security branch. > > Risk Level: > These patches have been used and tested on large HBase clusters by FB , by those who use 20-append branch directly (various users including a 500 node HBase cluster at Yahoo) and by those that use the Cloudera distribution. We have reviewed the patches and have conducted further tests; testing and validation continues. > > > Patches: > HDFS-200. Support append and sync for hadoop 0.20 branch. > HDFS-142. Blocks that are being written by a client are stored in the blocksBeingWritten directory. > HDFS-1057. Concurrent readers hit ChecksumExceptions if following a writer to very end of file > HDFS-724. Use a bidirectional heartbeat to detect stuck pipeline. > HDFS-895. Allow hflush/sync to occur in parallel with new writes to the file. > HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease recovery. > HDFS-1555. Disallow pipelien recovery if a file is already being lease recovered. > HDFS-1554. New semantics for recoverLease. > HDFS-988. Fix bug where savenameSpace can corrupt edits log. > HDFS-826. Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline. > HDFS-630. Client can exclude specific nodes in the write pipeline. > HDFS-1141. completeFile does not check lease ownership. > HDFS-1204. Lease expiration should recover single files, not entire lease holder > HDFS-1254. Support append/sync via the default configuration. > HDFS-1346. DFSClient receives out of order packet ack. > HDFS-1054. remove sleep before retry for allocating a block. > >
-
Re: Add Append-HBase support in upcoming 20.205Andrew Purtell 2011-09-02, 05:56
> From: Arun C Murthy <[EMAIL PROTECTED]>
> +1 > > I think it's high time (I thought so too last Dec too: > http://s.apache.org/jr) we had an official Hadoop release which supports > HBase. Thanks for all the effort Andrew - it will be really nice to have > an Apache 0.20 with security+append! Thanks to Dhruba, Todd, Hairong, and the other original contributors of append support to HDFS 0.20.x. I'm +1 obviously. :-) Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ----- Original Message ----- > From: Arun C Murthy <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: > Sent: Friday, September 2, 2011 6:36 AM > Subject: Re: Add Append-HBase support in upcoming 20.205 > > > On Aug 31, 2011, at 11:41 AM, sanjay Radia wrote: > >> >> I propose that the 20-append patches (details below) be included in 20.205 > which will become the first official Apache >> release of Hadoop that supports Append and HBase. >> > >> Background: >> There hasn't been a official Apache release that supports HBase. >> The HBase community have instead been using the 20-append branch; the > patches were contributed by the HBase community including Facebook. The Cloudera > distribution has also included these patches. >> Andrew Purtell has ported these patches to 20-security branch. >> > > > +1 > > I think it's high time (I thought so too last Dec too: > http://s.apache.org/jr) we had an official Hadoop release which supports HBase. > Thanks for all the effort Andrew - it will be really nice to have an Apache 0.20 > with security+append! > > Arun > >> Risk Level: >> These patches have been used and tested on large HBase clusters by FB , by > those who use 20-append branch directly (various users including a 500 node > HBase cluster at Yahoo) and by those that use the Cloudera distribution. We have > reviewed the patches and have conducted further tests; testing and validation > continues. >> >> >> Patches: >> HDFS-200. Support append and sync for hadoop 0.20 branch. >> HDFS-142. Blocks that are being written by a client are stored in the > blocksBeingWritten directory. >> HDFS-1057. Concurrent readers hit ChecksumExceptions if following a writer > to very end of file >> HDFS-724. Use a bidirectional heartbeat to detect stuck pipeline. >> HDFS-895. Allow hflush/sync to occur in parallel with new writes to the > file. >> HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease > recovery. >> HDFS-1555. Disallow pipelien recovery if a file is already being lease > recovered. >> HDFS-1554. New semantics for recoverLease. >> HDFS-988. Fix bug where savenameSpace can corrupt edits log. >> HDFS-826. Allow a mechanism for an application to detect that datanode(s) > have died in the write pipeline. >> HDFS-630. Client can exclude specific nodes in the write pipeline. >> HDFS-1141. completeFile does not check lease ownership. >> HDFS-1204. Lease expiration should recover single files, not entire lease > holder >> HDFS-1254. Support append/sync via the default configuration. >> HDFS-1346. DFSClient receives out of order packet ack. >> HDFS-1054. remove sleep before retry for allocating a block. >> >
-
Re: Add Append-HBase support in upcoming 20.205Arun C Murthy 2011-09-02, 06:01
On Sep 1, 2011, at 10:56 PM, Andrew Purtell wrote: >> From: Arun C Murthy <[EMAIL PROTECTED]> > >> +1 >> >> I think it's high time (I thought so too last Dec too: >> http://s.apache.org/jr) we had an official Hadoop release which supports >> HBase. Thanks for all the effort Andrew - it will be really nice to have >> an Apache 0.20 with security+append! > > Thanks to Dhruba, Todd, Hairong, and the other original contributors of append support to HDFS 0.20.x. > But of course, I really should have qualified my statement by saying: thanks for porting branch-0.20-append on 0.20.2xx. Yes, thanks to everyone who contributed to branch-0.20-append of course. Arun > I'm +1 obviously. :-) > > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > > > ----- Original Message ----- >> From: Arun C Murthy <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Cc: >> Sent: Friday, September 2, 2011 6:36 AM >> Subject: Re: Add Append-HBase support in upcoming 20.205 >> >> >> On Aug 31, 2011, at 11:41 AM, sanjay Radia wrote: >> >>> >>> I propose that the 20-append patches (details below) be included in 20.205 >> which will become the first official Apache >>> release of Hadoop that supports Append and HBase. >>> >> >>> Background: >>> There hasn't been a official Apache release that supports HBase. >>> The HBase community have instead been using the 20-append branch; the >> patches were contributed by the HBase community including Facebook. The Cloudera >> distribution has also included these patches. >>> Andrew Purtell has ported these patches to 20-security branch. >>> >> >> >> +1 >> >> I think it's high time (I thought so too last Dec too: >> http://s.apache.org/jr) we had an official Hadoop release which supports HBase. >> Thanks for all the effort Andrew - it will be really nice to have an Apache 0.20 >> with security+append! >> >> Arun >> >>> Risk Level: >>> These patches have been used and tested on large HBase clusters by FB , by >> those who use 20-append branch directly (various users including a 500 node >> HBase cluster at Yahoo) and by those that use the Cloudera distribution. We have >> reviewed the patches and have conducted further tests; testing and validation >> continues. >>> >>> >>> Patches: >>> HDFS-200. Support append and sync for hadoop 0.20 branch. >>> HDFS-142. Blocks that are being written by a client are stored in the >> blocksBeingWritten directory. >>> HDFS-1057. Concurrent readers hit ChecksumExceptions if following a writer >> to very end of file >>> HDFS-724. Use a bidirectional heartbeat to detect stuck pipeline. >>> HDFS-895. Allow hflush/sync to occur in parallel with new writes to the >> file. >>> HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease >> recovery. >>> HDFS-1555. Disallow pipelien recovery if a file is already being lease >> recovered. >>> HDFS-1554. New semantics for recoverLease. >>> HDFS-988. Fix bug where savenameSpace can corrupt edits log. >>> HDFS-826. Allow a mechanism for an application to detect that datanode(s) >> have died in the write pipeline. >>> HDFS-630. Client can exclude specific nodes in the write pipeline. >>> HDFS-1141. completeFile does not check lease ownership. >>> HDFS-1204. Lease expiration should recover single files, not entire lease >> holder >>> HDFS-1254. Support append/sync via the default configuration. >>> HDFS-1346. DFSClient receives out of order packet ack. >>> HDFS-1054. remove sleep before retry for allocating a block. >>> >>
-
Re: Add Append-HBase support in upcoming 20.205Suresh Srinivas 2011-09-02, 18:20
I also propose following jiras, which are non append related bug fixes from
0.20-append branch: - HDFS-1164. TestHdfsProxy is failing. - HDFS-1211. Block receiver should not log "rewind" packets at INFO level. - HDFS-1118. Fix socketleak on DFSClient. - HDFS-1210. DFSClient should log exception when block recovery fails. - HDFS-606. Fix ConcurrentModificationException in invalidateCorruptReplicas. - HDFS-561. Fix write pipeline READ_TIMEOUT. - HDFS-1202. DataBlockScanner throws NPE when updated before initialized. Risk Level: These are useful bugfixes from append branch and are not big changes to the code base. These jiras have already been merged into 0.20-security branch.
-
Re: Add Append-HBase support in upcoming 20.205Todd Lipcon 2011-09-02, 20:03
The following other JIRAs have been committed in CDH for 18 months or
so, for the purpose of HBase. You may want to consider backporting them as well - many were never committed to 0.20-append due to lack of reviews by HDFS committers at the time. HDFS-1056. Fix possible multinode deadlocks during block recovery when using ephemeral dataxceiv Description: Fixes the logic by which datanodes identify local RPC targets during block recovery for the case when the datanode is configured with an ephemeral data transceiver port. Reason: Potential internode deadlock for clusters using ephemeral ports HADOOP-6722. Workaround a TCP spec quirk by not allowing NetUtils.connect to connect to itself Description: TCP's ephemeral port assignment results in the possibility that a client can connect back to its own outgoing socket, resulting in failed RPCs or datanode transfers. Reason: Fixes intermittent errors in cluster testing with ephemeral IPC/transceiver ports on datanodes. HDFS-1122. Don't allow client verification to prematurely add inprogress blocks to DataBlockScanner Description: When a client reads a block that is also open for writing, it should not add it to the datanode block scanner. If it does, the block scanner can incorrectly mark the block as corrupt, causing data loss. Reason: Potential dataloss with concurrent writer-reader case. HDFS-1248. Miscellaneous cleanup and improvements on 0.20 append branch Description: Miscellaneous code cleanup and logging changes, including: - Slight cleanup to recoverFile() function in TestFileAppend4 - Improve error messages on OP_READ_BLOCK - Some comment cleanup in FSNamesystem - Remove toInodeUnderConstruction (was not used) - Add some checks for null blocks in FSNamesystem to avoid a possible NPE - Only log "inconsistent size" warnings at WARN level for non-under-construction blocks. - Redundant addStoredBlock calls are also not worthy of WARN level - Add some extra information to a warning in ReplicationTargetChooser Reason: Improves diagnosis of error cases and clarity of code HDFS-1242. Add unit test for the appendFile race condition / synchronization bug fixed in HDFS-142 Reason: Test coverage for previously applied patch. HDFS-1218. Replicas that are recovered during DN startup should not be allowed to truncate better replicas. Description: If a datanode loses power and then recovers, its replicas may be truncated due to the recovery of the local FS journal. This patch ensures that a replica truncated by a power loss does not truncate the block on HDFS. Reason: Potential dataloss bug uncovered by power failure simulation HDFS-915. Write pipeline hangs for too long when ResponseProcessor hits timeout Description: Previously, the write pipeline would hang for the entire write timeout when it encountered a read timeout (eg due to a network connectivity issue). This patch interrupts the writing thread when a read error occurs. Reason: Faster recovery from pipeline failure for HBase and other interactive applications. HDFS-1186. Writers should be interrupted when recovery is started, not when it's completed. Description: When the write pipeline recovery process is initiated, this interrupts any concurrent writers to the block under recovery. This prevents a case where some edits may be lost if the writer has lost its lease but continues to write (eg due to a garbage collection pause) Reason: Fixes a potential dataloss bug commit a960eea40dbd6a4e87072bdf73ac3b62e772f70a Author: Todd Lipcon <[EMAIL PROTECTED]> Date: Sun Jun 13 23:02:38 2010 -0700 HDFS-1197. Received blocks should not be added to block map prematurely for under construction files Description: Fixes a possible dataloss scenario when using append() on real-life clusters. Also augments unit tests to uncover similar bugs in the future by simulating latency when reporting blocks received by datanodes. Reason: Append support dataloss bug Author: Todd Lipcon HDFS-1260. tryUpdateBlock should do validation before renaming meta file Description: Solves bug where block became inaccessible in certain failure conditions (particularly network partitions). Observed under HBase workload at user site. Reason: Potential loss of syunced data when write pipeline fails On Fri, Sep 2, 2011 at 11:20 AM, Suresh Srinivas <[EMAIL PROTECTED]> wrote: Todd Lipcon Software Engineer, Cloudera
-
Re: Add Append-HBase support in upcoming 20.205Matt Foley 2011-09-02, 21:34
Hi Todd,
Thank you, this is tremendously valuable input! I'll have to look in detail at each of these ten jiras, and will get back to the list with more info shortly. --Matt On Fri, Sep 2, 2011 at 1:03 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > The following other JIRAs have been committed in CDH for 18 months or > so, for the purpose of HBase. You may want to consider backporting > them as well - many were never committed to 0.20-append due to lack of > reviews by HDFS committers at the time. > > HDFS-1056. Fix possible multinode deadlocks during block recovery > when using ephemeral dataxceiv > > Description: Fixes the logic by which datanodes identify local RPC > targets > during block recovery for the case when the datanode > is configured with an ephemeral data transceiver port. > Reason: Potential internode deadlock for clusters using ephemeral ports > > > HADOOP-6722. Workaround a TCP spec quirk by not allowing > NetUtils.connect to connect to itself > > Description: TCP's ephemeral port assignment results in the possibility > that a client can connect back to its own outgoing socket, > resulting in failed RPCs or datanode transfers. > Reason: Fixes intermittent errors in cluster testing with ephemeral > IPC/transceiver ports on datanodes. > > HDFS-1122. Don't allow client verification to prematurely add > inprogress blocks to DataBlockScanner > > Description: When a client reads a block that is also open for writing, > it should not add it to the datanode block scanner. > If it does, the block scanner can incorrectly mark the > block as corrupt, causing data loss. > Reason: Potential dataloss with concurrent writer-reader case. > > HDFS-1248. Miscellaneous cleanup and improvements on 0.20 append branch > > Description: Miscellaneous code cleanup and logging changes, including: > - Slight cleanup to recoverFile() function in TestFileAppend4 > - Improve error messages on OP_READ_BLOCK > - Some comment cleanup in FSNamesystem > - Remove toInodeUnderConstruction (was not used) > - Add some checks for null blocks in FSNamesystem to avoid a possible > NPE > - Only log "inconsistent size" warnings at WARN level for > non-under-construction blocks. > - Redundant addStoredBlock calls are also not worthy of WARN level > - Add some extra information to a warning in ReplicationTargetChooser > Reason: Improves diagnosis of error cases and clarity of code > > > HDFS-1242. Add unit test for the appendFile race condition / > synchronization bug fixed in HDFS-142 > > Reason: Test coverage for previously applied patch. > > HDFS-1218. Replicas that are recovered during DN startup should > not be allowed to truncate better replicas. > > Description: If a datanode loses power and then recovers, its replicas > may be truncated due to the recovery of the local FS > journal. This patch ensures that a replica truncated by > a power loss does not truncate the block on HDFS. > Reason: Potential dataloss bug uncovered by power failure simulation > > HDFS-915. Write pipeline hangs for too long when ResponseProcessor > hits timeout > > Description: Previously, the write pipeline would hang for the entire > write > timeout when it encountered a read timeout (eg due to a > network connectivity issue). This patch interrupts the > writing > thread when a read error occurs. > Reason: Faster recovery from pipeline failure for HBase and other > interactive applications. > > > HDFS-1186. Writers should be interrupted when recovery is started, > not when it's completed. > > Description: When the write pipeline recovery process is initiated, this > interrupts any concurrent writers to the block under > recovery. > This prevents a case where some edits may be lost if the
-
Re: Add Append-HBase support in upcoming 20.205Eli Collins 2011-09-02, 21:48
Hey Matt,
You can see the full change log here: http://archive.cloudera.com/cdh/3/hadoop-0.20.2+923.97.CHANGES.txt Most changes done for HBase have it listed in the "Reason" field. There's a directory in the source tarball that contains all the individual patches broken out. Cheers, Eli On Fri, Sep 2, 2011 at 2:34 PM, Matt Foley <[EMAIL PROTECTED]> wrote: > Hi Todd, > Thank you, this is tremendously valuable input! I'll have to look in detail > at each of these ten jiras, > and will get back to the list with more info shortly. > --Matt > > On Fri, Sep 2, 2011 at 1:03 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > >> The following other JIRAs have been committed in CDH for 18 months or >> so, for the purpose of HBase. You may want to consider backporting >> them as well - many were never committed to 0.20-append due to lack of >> reviews by HDFS committers at the time. >> >> HDFS-1056. Fix possible multinode deadlocks during block recovery >> when using ephemeral dataxceiv >> >> Description: Fixes the logic by which datanodes identify local RPC >> targets >> during block recovery for the case when the datanode >> is configured with an ephemeral data transceiver port. >> Reason: Potential internode deadlock for clusters using ephemeral ports >> >> >> HADOOP-6722. Workaround a TCP spec quirk by not allowing >> NetUtils.connect to connect to itself >> >> Description: TCP's ephemeral port assignment results in the possibility >> that a client can connect back to its own outgoing socket, >> resulting in failed RPCs or datanode transfers. >> Reason: Fixes intermittent errors in cluster testing with ephemeral >> IPC/transceiver ports on datanodes. >> >> HDFS-1122. Don't allow client verification to prematurely add >> inprogress blocks to DataBlockScanner >> >> Description: When a client reads a block that is also open for writing, >> it should not add it to the datanode block scanner. >> If it does, the block scanner can incorrectly mark the >> block as corrupt, causing data loss. >> Reason: Potential dataloss with concurrent writer-reader case. >> >> HDFS-1248. Miscellaneous cleanup and improvements on 0.20 append branch >> >> Description: Miscellaneous code cleanup and logging changes, including: >> - Slight cleanup to recoverFile() function in TestFileAppend4 >> - Improve error messages on OP_READ_BLOCK >> - Some comment cleanup in FSNamesystem >> - Remove toInodeUnderConstruction (was not used) >> - Add some checks for null blocks in FSNamesystem to avoid a possible >> NPE >> - Only log "inconsistent size" warnings at WARN level for >> non-under-construction blocks. >> - Redundant addStoredBlock calls are also not worthy of WARN level >> - Add some extra information to a warning in ReplicationTargetChooser >> Reason: Improves diagnosis of error cases and clarity of code >> >> >> HDFS-1242. Add unit test for the appendFile race condition / >> synchronization bug fixed in HDFS-142 >> >> Reason: Test coverage for previously applied patch. >> >> HDFS-1218. Replicas that are recovered during DN startup should >> not be allowed to truncate better replicas. >> >> Description: If a datanode loses power and then recovers, its replicas >> may be truncated due to the recovery of the local FS >> journal. This patch ensures that a replica truncated by >> a power loss does not truncate the block on HDFS. >> Reason: Potential dataloss bug uncovered by power failure simulation >> >> HDFS-915. Write pipeline hangs for too long when ResponseProcessor >> hits timeout >> >> Description: Previously, the write pipeline would hang for the entire >> write >> timeout when it encountered a read timeout (eg due to a >> network connectivity issue). This patch interrupts the |