|
Colin McCabe
2013-02-17, 21:48
Stack
2013-02-18, 01:49
Andrew Purtell
2013-02-18, 02:35
Tsz Wo Sze
2013-02-18, 22:03
Todd Lipcon
2013-02-20, 00:11
Patrick Angeles
2013-02-20, 18:08
Tsz Wo Sze
2013-02-20, 19:56
Todd Lipcon
2013-02-20, 20:16
Tsz Wo Sze
2013-02-20, 21:48
Todd Lipcon
2013-02-20, 22:27
Suresh Srinivas
2013-02-20, 22:49
Tsz Wo Sze
2013-02-20, 23:01
Todd Lipcon
2013-02-20, 23:01
Todd Lipcon
2013-02-20, 23:06
Todd Lipcon
2013-02-20, 23:06
Tsz Wo Sze
2013-02-20, 23:08
Todd Lipcon
2013-02-20, 23:13
Suresh Srinivas
2013-02-20, 23:19
Suresh Srinivas
2013-02-20, 23:31
Todd Lipcon
2013-02-20, 23:40
Suresh Srinivas
2013-02-21, 00:04
Todd Lipcon
2013-02-21, 00:12
Suresh Srinivas
2013-02-21, 00:28
Chris Douglas
2013-02-21, 00:29
Suresh Srinivas
2013-02-21, 00:47
Aaron T. Myers
2013-02-21, 01:12
Andrew Purtell
2013-02-21, 01:32
Suresh Srinivas
2013-02-21, 15:40
Chris Douglas
2013-02-21, 21:24
Tsz Wo Sze
2013-02-21, 22:15
Colin McCabe
2013-02-22, 19:13
Eli Collins
2013-02-22, 21:55
Tsz Wo Sze
2013-02-23, 02:32
Aaron T. Myers
2013-02-23, 02:40
Tsz Wo Sze
2013-02-24, 00:23
Eli Collins
2013-02-25, 18:24
Colin McCabe
2013-02-25, 18:31
Tsz Wo Sze
2013-02-25, 20:50
Eli Collins
2013-02-25, 21:16
Chris Douglas
2013-02-25, 21:50
Eli Collins
2013-02-25, 22:01
Suresh Srinivas
2013-02-26, 00:09
Eli Collins
2013-02-26, 00:39
Suresh Srinivas
2013-02-26, 17:33
Eli Collins
2013-02-26, 19:24
Suresh Srinivas
2013-02-26, 19:35
Bikas Saha
2013-02-26, 21:47
Eli Collins
2013-02-26, 21:51
Todd Lipcon
2013-02-26, 22:07
Chris Douglas
2013-02-27, 00:52
Suresh Srinivas
2013-02-27, 01:09
sanjay Radia
2013-02-27, 01:36
sanjay Radia
2013-02-27, 19:45
Eli Collins
2013-02-27, 20:06
Colin McCabe
2013-02-27, 23:28
Chris Douglas
2013-02-27, 23:29
Eli Collins
2013-02-27, 23:42
Colin McCabe
2013-03-05, 20:24
Suresh Srinivas
2013-03-05, 21:09
Tsz Wo Sze
2013-03-05, 23:08
Colin McCabe
2013-04-01, 23:32
|
-
VOTE: HDFS-347 mergeColin McCabe 2013-02-17, 21:48
Hi all,
I would like to merge the HDFS-347 branch back to trunk. It's been under intensive review and testing for several months. The branch adds a lot of new unit tests, and passes Jenkins as of 2/15 [1] We have tested HDFS-347 with both random and sequential workloads. The short-circuit case is substantially faster [2], and overall performance looks very good. This is especially encouraging given that the initial goal of this work was to make security compatible with short-circuit local reads, rather than to optimize the short-circuit code path. We've also stress-tested HDFS-347 on a number of clusters. This iniial VOTE is to merge only into trunk. Just as we have done with our other recent merges, we will consider merging into branch-2 after the code has been in trunk for few weeks. Please cast your vote by EOD Sunday 2/24. best, Colin McCabe [1] https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704 [2] https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755
-
Re: VOTE: HDFS-347 mergeStack 2013-02-18, 01:49
+1
On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe <[EMAIL PROTECTED]>wrote: > Hi all, > > I would like to merge the HDFS-347 branch back to trunk. It's been > under intensive review and testing for several months. The branch > adds a lot of new unit tests, and passes Jenkins as of 2/15 [1] > > We have tested HDFS-347 with both random and sequential workloads. The > short-circuit case is substantially faster [2], and overall > performance looks very good. This is especially encouraging given > that the initial goal of this work was to make security compatible > with short-circuit local reads, rather than to optimize the > short-circuit code path. We've also stress-tested HDFS-347 on a > number of clusters. > > This iniial VOTE is to merge only into trunk. Just as we have done > with our other recent merges, we will consider merging into branch-2 > after the code has been in trunk for few weeks. > > Please cast your vote by EOD Sunday 2/24. > > best, > Colin McCabe > > [1] > https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704 > > [2] > https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755 >
-
Re: VOTE: HDFS-347 mergeAndrew Purtell 2013-02-18, 02:35
+1 (non binding)
If this gets in, a backport to branch-2 would be most appreciated. On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe <[EMAIL PROTECTED]>wrote: > Hi all, > > I would like to merge the HDFS-347 branch back to trunk. It's been > under intensive review and testing for several months. The branch > adds a lot of new unit tests, and passes Jenkins as of 2/15 [1] > > We have tested HDFS-347 with both random and sequential workloads. The > short-circuit case is substantially faster [2], and overall > performance looks very good. This is especially encouraging given > that the initial goal of this work was to make security compatible > with short-circuit local reads, rather than to optimize the > short-circuit code path. We've also stress-tested HDFS-347 on a > number of clusters. > > This iniial VOTE is to merge only into trunk. Just as we have done > with our other recent merges, we will consider merging into branch-2 > after the code has been in trunk for few weeks. > > Please cast your vote by EOD Sunday 2/24. > > best, > Colin McCabe > > [1] > https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704 > > [2] > https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755 > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: VOTE: HDFS-347 mergeTsz Wo Sze 2013-02-18, 22:03
Hi Colin,
The latest HDFS-347 patch was posted on Feb 16. Because of the long weekends, I still do not have a chance to check it. Will do it in this week. Nicholas ________________________________ From: Colin McCabe <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Sunday, February 17, 2013 1:48 PM Subject: VOTE: HDFS-347 merge Hi all, I would like to merge the HDFS-347 branch back to trunk. It's been under intensive review and testing for several months. The branch adds a lot of new unit tests, and passes Jenkins as of 2/15 [1] We have tested HDFS-347 with both random and sequential workloads. The short-circuit case is substantially faster [2], and overall performance looks very good. This is especially encouraging given that the initial goal of this work was to make security compatible with short-circuit local reads, rather than to optimize the short-circuit code path. We've also stress-tested HDFS-347 on a number of clusters. This iniial VOTE is to merge only into trunk. Just as we have done with our other recent merges, we will consider merging into branch-2 after the code has been in trunk for few weeks. Please cast your vote by EOD Sunday 2/24. best, Colin McCabe [1] https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704 [2] https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755
-
Re: VOTE: HDFS-347 mergeTodd Lipcon 2013-02-20, 00:11
+1 (binding)
I code-reviewed almost all of the code in this branch, and also spent some time benchmarking and testing under various workloads. We've also done significant testing on clusters here at Cloudera, both secure and insecure, and verified integration with a number of other ecosystem components (eg Pig, Hive, Impala, HBase, MR, etc). The feature works as advertised and should provide much better performance for a number of workloads, especially in secure environments. Thanks for the hard work, Colin! -Todd On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe <[EMAIL PROTECTED]>wrote: > Hi all, > > I would like to merge the HDFS-347 branch back to trunk. It's been > under intensive review and testing for several months. The branch > adds a lot of new unit tests, and passes Jenkins as of 2/15 [1] > > We have tested HDFS-347 with both random and sequential workloads. The > short-circuit case is substantially faster [2], and overall > performance looks very good. This is especially encouraging given > that the initial goal of this work was to make security compatible > with short-circuit local reads, rather than to optimize the > short-circuit code path. We've also stress-tested HDFS-347 on a > number of clusters. > > This iniial VOTE is to merge only into trunk. Just as we have done > with our other recent merges, we will consider merging into branch-2 > after the code has been in trunk for few weeks. > > Please cast your vote by EOD Sunday 2/24. > > best, > Colin McCabe > > [1] > https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704 > > [2] > https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755 > -- Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergePatrick Angeles 2013-02-20, 18:08
+1 (non binding)
This is great... On Tue, Feb 19, 2013 at 7:11 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > +1 (binding) > > I code-reviewed almost all of the code in this branch, and also spent some > time benchmarking and testing under various workloads. We've also done > significant testing on clusters here at Cloudera, both secure and insecure, > and verified integration with a number of other ecosystem components (eg > Pig, Hive, Impala, HBase, MR, etc). The feature works as advertised and > should provide much better performance for a number of workloads, > especially in secure environments. > > Thanks for the hard work, Colin! > > -Todd > > On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe <[EMAIL PROTECTED] > >wrote: > > > Hi all, > > > > I would like to merge the HDFS-347 branch back to trunk. It's been > > under intensive review and testing for several months. The branch > > adds a lot of new unit tests, and passes Jenkins as of 2/15 [1] > > > > We have tested HDFS-347 with both random and sequential workloads. The > > short-circuit case is substantially faster [2], and overall > > performance looks very good. This is especially encouraging given > > that the initial goal of this work was to make security compatible > > with short-circuit local reads, rather than to optimize the > > short-circuit code path. We've also stress-tested HDFS-347 on a > > number of clusters. > > > > This iniial VOTE is to merge only into trunk. Just as we have done > > with our other recent merges, we will consider merging into branch-2 > > after the code has been in trunk for few weeks. > > > > Please cast your vote by EOD Sunday 2/24. > > > > best, > > Colin McCabe > > > > [1] > > > https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704 > > > > [2] > > > https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755 > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
-
Re: VOTE: HDFS-347 mergeTsz Wo Sze 2013-02-20, 19:56
-1
The patch seems not ready yet. I have posted some comments/suggestions on the JIRA. Colin also has agreed that there are some bugs to be fixed. Sorry. Tsz-Wo ________________________________ From: Todd Lipcon <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Tuesday, February 19, 2013 4:11 PM Subject: Re: VOTE: HDFS-347 merge +1 (binding) I code-reviewed almost all of the code in this branch, and also spent some time benchmarking and testing under various workloads. We've also done significant testing on clusters here at Cloudera, both secure and insecure, and verified integration with a number of other ecosystem components (eg Pig, Hive, Impala, HBase, MR, etc). The feature works as advertised and should provide much better performance for a number of workloads, especially in secure environments. Thanks for the hard work, Colin! -Todd On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe <[EMAIL PROTECTED]>wrote: > Hi all, > > I would like to merge the HDFS-347 branch back to trunk. It's been > under intensive review and testing for several months. The branch > adds a lot of new unit tests, and passes Jenkins as of 2/15 [1] > > We have tested HDFS-347 with both random and sequential workloads. The > short-circuit case is substantially faster [2], and overall > performance looks very good. This is especially encouraging given > that the initial goal of this work was to make security compatible > with short-circuit local reads, rather than to optimize the > short-circuit code path. We've also stress-tested HDFS-347 on a > number of clusters. > > This iniial VOTE is to merge only into trunk. Just as we have done > with our other recent merges, we will consider merging into branch-2 > after the code has been in trunk for few weeks. > > Please cast your vote by EOD Sunday 2/24. > > best, > Colin McCabe > > [1] > https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704 > > [2] > https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755 > -- Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeTodd Lipcon 2013-02-20, 20:16
Hi Nicholas,
I looked at your comments on the JIRA, and they all seem like trivial things that could be addressed post-merge, and none of them would affect the functionality. If Colin addresses these issues, will you amend your vote to +1 within the called-for voting period? It concerns me that we've been asking for reviews on this branch for multiple months now, and yet you're only bringing up some of these things now that a merge vote is called. Colin sentp a note to this list a month ago (http://markmail.org/message/phcfc3watwlqiemw) saying that the merge was coming soon. Since then, we found a few small bugs around the configuration/setup code, but all of the things you're bringing up in the review now have been in the branch since the new year, so I feel like there has been quite ample time for review. -Todd On Wed, Feb 20, 2013 at 11:56 AM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: > -1 > The patch seems not ready yet. I have posted some comments/suggestions on the JIRA. Colin also has agreed that there are some bugs to be fixed. Sorry. > > Tsz-Wo > > > > > ________________________________ > From: Todd Lipcon <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Tuesday, February 19, 2013 4:11 PM > Subject: Re: VOTE: HDFS-347 merge > > +1 (binding) > > I code-reviewed almost all of the code in this branch, and also spent some > time benchmarking and testing under various workloads. We've also done > significant testing on clusters here at Cloudera, both secure and insecure, > and verified integration with a number of other ecosystem components (eg > Pig, Hive, Impala, HBase, MR, etc). The feature works as advertised and > should provide much better performance for a number of workloads, > especially in secure environments. > > Thanks for the hard work, Colin! > > -Todd > > On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe <[EMAIL PROTECTED]>wrote: > >> Hi all, >> >> I would like to merge the HDFS-347 branch back to trunk. It's been >> under intensive review and testing for several months. The branch >> adds a lot of new unit tests, and passes Jenkins as of 2/15 [1] >> >> We have tested HDFS-347 with both random and sequential workloads. The >> short-circuit case is substantially faster [2], and overall >> performance looks very good. This is especially encouraging given >> that the initial goal of this work was to make security compatible >> with short-circuit local reads, rather than to optimize the >> short-circuit code path. We've also stress-tested HDFS-347 on a >> number of clusters. >> >> This iniial VOTE is to merge only into trunk. Just as we have done >> with our other recent merges, we will consider merging into branch-2 >> after the code has been in trunk for few weeks. >> >> Please cast your vote by EOD Sunday 2/24. >> >> best, >> Colin McCabe >> >> [1] >> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704 >> >> [2] >> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755 >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeTsz Wo Sze 2013-02-20, 21:48
The previous patch of HDFS-347 was posted on Jan 31 (2013.01.31.consolidated2.patch) I have tried to review it but the code is quite unreadable at that time. Then, the next patch is the latest patch 2013.02.15.consolidated4.patch posted in the evening of Feb 15, right before the weekends. As mentioned previously, I did not get a chance to check it until yesterday (Feb 19).
The currently patch is still not yet ready. It seems to have unnecessarily changed the API and protocol. I believe those are important but not trivial things. Tsz-Wo ________________________________ From: Todd Lipcon <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; Tsz Wo Sze <[EMAIL PROTECTED]> Sent: Wednesday, February 20, 2013 12:16 PM Subject: Re: VOTE: HDFS-347 merge Hi Nicholas, I looked at your comments on the JIRA, and they all seem like trivial things that could be addressed post-merge, and none of them would affect the functionality. If Colin addresses these issues, will you amend your vote to +1 within the called-for voting period? It concerns me that we've been asking for reviews on this branch for multiple months now, and yet you're only bringing up some of these things now that a merge vote is called. Colin sentp a note to this list a month ago (http://markmail.org/message/phcfc3watwlqiemw) saying that the merge was coming soon. Since then, we found a few small bugs around the configuration/setup code, but all of the things you're bringing up in the review now have been in the branch since the new year, so I feel like there has been quite ample time for review. -Todd On Wed, Feb 20, 2013 at 11:56 AM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: > -1 > The patch seems not ready yet. I have posted some comments/suggestions on the JIRA. Colin also has agreed that there are some bugs to be fixed. Sorry. > > Tsz-Wo > > > > > ________________________________ > From: Todd Lipcon <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Tuesday, February 19, 2013 4:11 PM > Subject: Re: VOTE: HDFS-347 merge > > +1 (binding) > > I code-reviewed almost all of the code in this branch, and also spent some > time benchmarking and testing under various workloads. We've also done > significant testing on clusters here at Cloudera, both secure and insecure, > and verified integration with a number of other ecosystem components (eg > Pig, Hive, Impala, HBase, MR, etc). The feature works as advertised and > should provide much better performance for a number of workloads, > especially in secure environments. > > Thanks for the hard work, Colin! > > -Todd > > On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe <[EMAIL PROTECTED]>wrote: > >> Hi all, >> >> I would like to merge the HDFS-347 branch back to trunk. It's been >> under intensive review and testing for several months. The branch >> adds a lot of new unit tests, and passes Jenkins as of 2/15 [1] >> >> We have tested HDFS-347 with both random and sequential workloads. The >> short-circuit case is substantially faster [2], and overall >> performance looks very good. This is especially encouraging given >> that the initial goal of this work was to make security compatible >> with short-circuit local reads, rather than to optimize the >> short-circuit code path. We've also stress-tested HDFS-347 on a >> number of clusters. >> >> This iniial VOTE is to merge only into trunk. Just as we have done >> with our other recent merges, we will consider merging into branch-2 >> after the code has been in trunk for few weeks. >> >> Please cast your vote by EOD Sunday 2/24. >> >> best, >> Colin McCabe >> >> [1] >> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704 >> >> [2] >> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755 >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeTodd Lipcon 2013-02-20, 22:27
The point isn't when the consolidated patches were posted to the JIRA.
The branch is on the public SVN and has been for months. The work was done incrementally on the branch, and you were welcome (and encouraged) to review it all along. The consolidated patches are mostly there in order to get a "Hadoop QA" run against the branch, since we can't currently run the QA bot on anything but trunk. -Todd On Wed, Feb 20, 2013 at 1:48 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: > The previous patch of HDFS-347 was posted on Jan 31 > (2013.01.31.consolidated2.patch) I have tried to review it but the code is > quite unreadable at that time. Then, the next patch is the latest patch > 2013.02.15.consolidated4.patch posted in the evening of Feb 15, right before > the weekends. As mentioned previously, I did not get a chance to check it > until yesterday (Feb 19). > > The currently patch is still not yet ready. It seems to have unnecessarily > changed the API and protocol. I believe those are important but not trivial > things. > > Tsz-Wo > > > ________________________________ > From: Todd Lipcon <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED]; Tsz Wo Sze <[EMAIL PROTECTED]> > Sent: Wednesday, February 20, 2013 12:16 PM > > Subject: Re: VOTE: HDFS-347 merge > > Hi Nicholas, > > I looked at your comments on the JIRA, and they all seem like trivial > things that could be addressed post-merge, and none of them would > affect the functionality. If Colin addresses these issues, will you > amend your vote to +1 within the called-for voting period? > > It concerns me that we've been asking for reviews on this branch for > multiple months now, and yet you're only bringing up some of these > things now that a merge vote is called. Colin sentp a note to this > list a month ago (http://markmail.org/message/phcfc3watwlqiemw) saying > that the merge was coming soon. Since then, we found a few small bugs > around the configuration/setup code, but all of the things you're > bringing up in the review now have been in the branch since the new > year, so I feel like there has been quite ample time for review. > > -Todd > > On Wed, Feb 20, 2013 at 11:56 AM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >> -1 >> The patch seems not ready yet. I have posted some comments/suggestions on >> the JIRA. Colin also has agreed that there are some bugs to be fixed. >> Sorry. >> >> Tsz-Wo >> >> >> >> >> ________________________________ >> From: Todd Lipcon <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Tuesday, February 19, 2013 4:11 PM >> Subject: Re: VOTE: HDFS-347 merge >> >> +1 (binding) >> >> I code-reviewed almost all of the code in this branch, and also spent some >> time benchmarking and testing under various workloads. We've also done >> significant testing on clusters here at Cloudera, both secure and >> insecure, >> and verified integration with a number of other ecosystem components (eg >> Pig, Hive, Impala, HBase, MR, etc). The feature works as advertised and >> should provide much better performance for a number of workloads, >> especially in secure environments. >> >> Thanks for the hard work, Colin! >> >> -Todd >> >> On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe >> <[EMAIL PROTECTED]>wrote: >> >>> Hi all, >>> >>> I would like to merge the HDFS-347 branch back to trunk. It's been >>> under intensive review and testing for several months. The branch >>> adds a lot of new unit tests, and passes Jenkins as of 2/15 [1] >>> >>> We have tested HDFS-347 with both random and sequential workloads. The >>> short-circuit case is substantially faster [2], and overall >>> performance looks very good. This is especially encouraging given >>> that the initial goal of this work was to make security compatible >>> with short-circuit local reads, rather than to optimize the >>> short-circuit code path. We've also stress-tested HDFS-347 on a >>> number of clusters. >>> >>> This iniial VOTE is to merge only into trunk. Just as we have done > Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeSuresh Srinivas 2013-02-20, 22:49
Todd,
Some of us have been trying to help test and review the code. However you might have missed the following, which has resulted in the review not completing: 02/06/13 - After intent for merge was sent, I posted comment saying consolidate patch has extraneous changes. That was non trivial amount of extraneous changes. 02/06/13 - Nicholas posted some comments and also indicated previous unaddressed comments. 02/15/13 - No update was made to consolidated patch. I stopped reviewing it waiting for the new patch. A new patch gets posted on 2/15 and soon after merge vote email on 2/17/13 during the long weekend. At this time, some of the comments that were made earlier have not been addressed. Also folks who were reviewing the consolidated patch have not posted +1. I think we should wait for +1 for the merge patch (from the folks actively reviewing the patch) before the merge vote. That might make this process smoother. But I agree, if the changes are deemed to be trivial, we can do it post merge to trunk. Regards, Suresh On Wed, Feb 20, 2013 at 12:16 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > Hi Nicholas, > > I looked at your comments on the JIRA, and they all seem like trivial > things that could be addressed post-merge, and none of them would > affect the functionality. If Colin addresses these issues, will you > amend your vote to +1 within the called-for voting period? > > It concerns me that we've been asking for reviews on this branch for > multiple months now, and yet you're only bringing up some of these > things now that a merge vote is called. Colin sentp a note to this > list a month ago (http://markmail.org/message/phcfc3watwlqiemw) saying > that the merge was coming soon. Since then, we found a few small bugs > around the configuration/setup code, but all of the things you're > bringing up in the review now have been in the branch since the new > year, so I feel like there has been quite ample time for review. > > -Todd > > On Wed, Feb 20, 2013 at 11:56 AM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: > > -1 > > The patch seems not ready yet. I have posted some comments/suggestions > on the JIRA. Colin also has agreed that there are some bugs to be fixed. > Sorry. > > > > Tsz-Wo > > > > > > > > > > ________________________________ > > From: Todd Lipcon <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED] > > Sent: Tuesday, February 19, 2013 4:11 PM > > Subject: Re: VOTE: HDFS-347 merge > > > > +1 (binding) > > > > I code-reviewed almost all of the code in this branch, and also spent > some > > time benchmarking and testing under various workloads. We've also done > > significant testing on clusters here at Cloudera, both secure and > insecure, > > and verified integration with a number of other ecosystem components (eg > > Pig, Hive, Impala, HBase, MR, etc). The feature works as advertised and > > should provide much better performance for a number of workloads, > > especially in secure environments. > > > > Thanks for the hard work, Colin! > > > > -Todd > > > > On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe <[EMAIL PROTECTED] > >wrote: > > > >> Hi all, > >> > >> I would like to merge the HDFS-347 branch back to trunk. It's been > >> under intensive review and testing for several months. The branch > >> adds a lot of new unit tests, and passes Jenkins as of 2/15 [1] > >> > >> We have tested HDFS-347 with both random and sequential workloads. The > >> short-circuit case is substantially faster [2], and overall > >> performance looks very good. This is especially encouraging given > >> that the initial goal of this work was to make security compatible > >> with short-circuit local reads, rather than to optimize the > >> short-circuit code path. We've also stress-tested HDFS-347 on a > >> number of clusters. > >> > >> This iniial VOTE is to merge only into trunk. Just as we have done > >> with our other recent merges, we will consider merging into branch-2 > >> after the code has been in trunk for few weeks. http://hortonworks.com/download/
-
Re: VOTE: HDFS-347 mergeTsz Wo Sze 2013-02-20, 23:01
Also, the patch seems to have removed the existing short-circuit read feature (HDFS-2246). It is an incompatible change. I think the patch is farther away from being ready and I would keep my -1.
Tsz-Wo ________________________________ From: Tsz Wo Sze <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Wednesday, February 20, 2013 11:56 AM Subject: Re: VOTE: HDFS-347 merge -1 The patch seems not ready yet. I have posted some comments/suggestions on the JIRA. Colin also has agreed that there are some bugs to be fixed. Sorry. Tsz-Wo ________________________________ From: Todd Lipcon <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Tuesday, February 19, 2013 4:11 PM Subject: Re: VOTE: HDFS-347 merge +1 (binding) I code-reviewed almost all of the code in this branch, and also spent some time benchmarking and testing under various workloads. We've also done significant testing on clusters here at Cloudera, both secure and insecure, and verified integration with a number of other ecosystem components (eg Pig, Hive, Impala, HBase, MR, etc). The feature works as advertised and should provide much better performance for a number of workloads, especially in secure environments. Thanks for the hard work, Colin! -Todd On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe <[EMAIL PROTECTED]>wrote: > Hi all, > > I would like to merge the HDFS-347 branch back to trunk. It's been > under intensive review and testing for several months. The branch > adds a lot of new unit tests, and passes Jenkins as of 2/15 [1] > > We have tested HDFS-347 with both random and sequential workloads. The > short-circuit case is substantially faster [2], and overall > performance looks very good. This is especially encouraging given > that the initial goal of this work was to make security compatible > with short-circuit local reads, rather than to optimize the > short-circuit code path. We've also stress-tested HDFS-347 on a > number of clusters. > > This iniial VOTE is to merge only into trunk. Just as we have done > with our other recent merges, we will consider merging into branch-2 > after the code has been in trunk for few weeks. > > Please cast your vote by EOD Sunday 2/24. > > best, > Colin McCabe > > [1] > https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704 > > [2] > https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755 > -- Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeTodd Lipcon 2013-02-20, 23:01
On Wed, Feb 20, 2013 at 2:49 PM, Suresh Srinivas <[EMAIL PROTECTED]> wrote:
> Todd, > > Some of us have been trying to help test and review the code. However you > might have missed the following, which has resulted in the review not > completing: > > 02/06/13 - After intent for merge was sent, I posted comment saying > consolidate patch has extraneous changes. That was non trivial amount of > extraneous changes. This was just an error with the "consolidate merge patch". Like I said in the previous email, these patches are just for Jenkins QA to run on, and I assume that any HDFS committer is able to look at the branch itself to understand the changes in it. It's easy to accidentally end up with extraneous changes when you try to generate these merge patches - eg the same thing happened to you earlier this week on HADOOP-8562 if I'm not mistaken. > 02/06/13 - Nicholas posted some comments and also indicated previous > unaddressed comments. Colin addressed this feedback on 2/8 in https://issues.apache.org/jira/browse/HDFS-4476 . Nicholas chose not to review the changes (though acknowledged the JIRA on 2/6), so Aaron committed it a week later. > 02/15/13 - No update was made to consolidated patch. I stopped reviewing it > waiting for the new patch. A new patch gets posted on 2/15 and soon after > merge vote email on 2/17/13 during the long weekend. > Again, all the changes are in the branch. Again I can't imagine trying to review a merge of a branch by looking at a 400KB patch. They're just there to trigger Jenkins. It should be the responsibility of committers to look at the branch itself. Or if you prefer a single patch, it's trivial to generate one in your local repo. > At this time, some of the comments that were made earlier have not been > addressed. Also folks who were reviewing the consolidated patch have not > posted +1. It seems like the best way to trigger people actually reviewing branches is to call merge votes. I would have hoped that people would review the work as it went along. If we waited for a +1 without calling a merge vote, this would drag on for months and months. This is based on my experience with the 3 or 4 branches I've worked on. > > I think we should wait for +1 for the merge patch (from the folks actively > reviewing the patch) before the merge vote. That might make this process > smoother. But I agree, if the changes are deemed to be trivial, we can do > it post merge to trunk. The difficulty is defining "actively reviewing the patch". Making 3 or 4 cursory comments once every 2 weeks doesn't look like "active review" to me. On the other hand, I spent probably 40-50% of my time over the last month reviewing and testing this branch and have voted +1. -Todd > > > On Wed, Feb 20, 2013 at 12:16 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > >> Hi Nicholas, >> >> I looked at your comments on the JIRA, and they all seem like trivial >> things that could be addressed post-merge, and none of them would >> affect the functionality. If Colin addresses these issues, will you >> amend your vote to +1 within the called-for voting period? >> >> It concerns me that we've been asking for reviews on this branch for >> multiple months now, and yet you're only bringing up some of these >> things now that a merge vote is called. Colin sentp a note to this >> list a month ago (http://markmail.org/message/phcfc3watwlqiemw) saying >> that the merge was coming soon. Since then, we found a few small bugs >> around the configuration/setup code, but all of the things you're >> bringing up in the review now have been in the branch since the new >> year, so I feel like there has been quite ample time for review. >> >> -Todd >> >> On Wed, Feb 20, 2013 at 11:56 AM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >> > -1 >> > The patch seems not ready yet. I have posted some comments/suggestions >> on the JIRA. Colin also has agreed that there are some bugs to be fixed. >> Sorry. >> > >> > Tsz-Wo >> > >> > >> > >> > >> > ________________________________ Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeTodd Lipcon 2013-02-20, 23:06
On Wed, Feb 20, 2013 at 3:01 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote:
> Also, the patch seems to have removed the existing short-circuit read feature (HDFS-2246). It is an incompatible change. I think the patch is farther away from being ready and I would keep my -1. The existing short circuit feature is insecure and was always considered a stop-gap solution. If you read the history of that feature, you can find comments like https://issues.apache.org/jira/browse/HDFS-4476 where I pointed out that it's only a stop-gap solution and the only reason I didn't veto is that folks agreed to later replace it with the proper solution (HDFS-347). Given that the API is the same, and this is an implementation detail, it is not incompatible. There is no reason to keep the old implementation around: it is both slower and unusable in the vast majority of clusters, where the data directories are owned by an HDFS user, and users of the cluster run under other unix credentials. -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeTodd Lipcon 2013-02-20, 23:06
On Wed, Feb 20, 2013 at 3:06 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> https://issues.apache.org/jira/browse/HDFS-4476 where I pointed out sorry, meant to link to: https://issues.apache.org/jira/browse/HDFS-2246?focusedCommentId=13102013&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13102013 clipboard fail... -- Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeTsz Wo Sze 2013-02-20, 23:08
The reason to keep it around is that the HDFS-347 only support Unix but not other OS.
Tsz-Wo ________________________________ From: Todd Lipcon <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; Tsz Wo Sze <[EMAIL PROTECTED]> Sent: Wednesday, February 20, 2013 3:06 PM Subject: Re: VOTE: HDFS-347 merge On Wed, Feb 20, 2013 at 3:01 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: > Also, the patch seems to have removed the existing short-circuit read feature (HDFS-2246). It is an incompatible change. I think the patch is farther away from being ready and I would keep my -1. The existing short circuit feature is insecure and was always considered a stop-gap solution. If you read the history of that feature, you can find comments like https://issues.apache.org/jira/browse/HDFS-4476 where I pointed out that it's only a stop-gap solution and the only reason I didn't veto is that folks agreed to later replace it with the proper solution (HDFS-347). Given that the API is the same, and this is an implementation detail, it is not incompatible. There is no reason to keep the old implementation around: it is both slower and unusable in the vast majority of clusters, where the data directories are owned by an HDFS user, and users of the cluster run under other unix credentials. -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeTodd Lipcon 2013-02-20, 23:13
On Wed, Feb 20, 2013 at 3:08 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote:
> The reason to keep it around is that the HDFS-347 only support Unix but not > other OS. Given that this is an optimization, and we have a ton of optimizations which don't yet run on Windows, I don't think that should be considered. Additionally, the Windows support has not yet been merged, nor is it in any release, so this isn't a regression. I would be happy to review an addition to the HDFS-347 branch which addresses this issue. But I don't think we should be maintaining two codepaths for the sake of an optimization on a platform which is not yet officially supported on trunk, especially when the old code path is _insecure_ and thus unusable in most environments. Todd > > ________________________________ > From: Todd Lipcon <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED]; Tsz Wo Sze <[EMAIL PROTECTED]> > Sent: Wednesday, February 20, 2013 3:06 PM > > Subject: Re: VOTE: HDFS-347 merge > > On Wed, Feb 20, 2013 at 3:01 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >> Also, the patch seems to have removed the existing short-circuit read >> feature (HDFS-2246). It is an incompatible change. I think the patch is >> farther away from being ready and I would keep my -1. > > The existing short circuit feature is insecure and was always > considered a stop-gap solution. If you read the history of that > feature, you can find comments like > https://issues.apache.org/jira/browse/HDFS-4476 where I pointed out > that it's only a stop-gap solution and the only reason I didn't veto > is that folks agreed to later replace it with the proper solution > (HDFS-347). > > Given that the API is the same, and this is an implementation detail, > it is not incompatible. There is no reason to keep the old > implementation around: it is both slower and unusable in the vast > majority of clusters, where the data directories are owned by an HDFS > user, and users of the cluster run under other unix credentials. > > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera > > -- Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeSuresh Srinivas 2013-02-20, 23:19
>
> > This was just an error with the "consolidate merge patch". Like I said > in the previous email, these patches are just for Jenkins QA to run > on, and I assume that any HDFS committer is able to look at the branch > itself to understand the changes in it. It's easy to accidentally end > up with extraneous changes when you try to generate these merge > patches - eg the same thing happened to you earlier this week on > HADOOP-8562 if I'm not mistaken. > I actually look at the merge patch for reviewing all the changes going in, if I cannot track all the changes and the incremental updates that may not make sense if you do not follow the patches in the order they are committed. It did happen on HADOOP-8562. I did say that this for Jenkins but not for commit. > > > The difficulty is defining "actively reviewing the patch". Making 3 or > 4 cursory comments once every 2 weeks doesn't look like "active > review" to me. On the other hand, I spent probably 40-50% of my time > over the last month reviewing and testing this branch and have voted > +1. I am not sure where the confusion is coming form. Clearly there were folks reviewing the consolidated patch, as I have shown. Clearly there were comments being made based on the consolidated patch and tests were being done using the branch. When a feature is developed in a branch, each jira without the context of the previous patch may not even make sense. I find it convenient to review the consolidated patch.
-
Re: VOTE: HDFS-347 mergeSuresh Srinivas 2013-02-20, 23:31
On Wed, Feb 20, 2013 at 3:13 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> On Wed, Feb 20, 2013 at 3:08 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: > > The reason to keep it around is that the HDFS-347 only support Unix but > not > > other OS. > > Given that this is an optimization, and we have a ton of optimizations > which don't yet run on Windows, I don't think that should be > considered. Additionally, the Windows support has not yet been merged, > nor is it in any release, so this isn't a regression. > This is a critical functionality for HBase peformance and an optimization we consider very important to have. > > I would be happy to review an addition to the HDFS-347 branch which > addresses this issue. But I don't think we should be maintaining two > codepaths for the sake of an optimization on a platform which is not > yet officially supported on trunk, especially when the old code path > is _insecure_ and thus unusable in most environments. I have to disagree. No where in the jira or the design it is explicitly stated that the old short circuit functionality is being removed. My assumption has been that it will not be removed. As regards "officially supported", we have been doing windows development for more than a year. In fact branch-1-win is being used by a lot users. Given merging it to branch-1 requires first making it available in trunk, we have been doing a lot of work in branch-trunk-win. It is almost ready to be merged as well. I am -1 on removing existing short circuit until an alternative short circuit similar to HDFS-347 on all the platforms.
-
Re: VOTE: HDFS-347 mergeTodd Lipcon 2013-02-20, 23:40
On Wed, Feb 20, 2013 at 3:31 PM, Suresh Srinivas <[EMAIL PROTECTED]> wrote:
>> Given that this is an optimization, and we have a ton of optimizations >> which don't yet run on Windows, I don't think that should be >> considered. Additionally, the Windows support has not yet been merged, >> nor is it in any release, so this isn't a regression. >> > > This is a critical functionality for HBase peformance and an optimization > we consider > very important to have. Too bad it doesn't work in any of the normal installations... none of the packages for Hadoop would allow it to work, given that the data directories will be owned by HDFS and not world readable, and tasks/HBase would run as an "hbase" user, which wouldn't have direct access to the block files. >> >> I would be happy to review an addition to the HDFS-347 branch which >> addresses this issue. But I don't think we should be maintaining two >> codepaths for the sake of an optimization on a platform which is not >> yet officially supported on trunk, especially when the old code path >> is _insecure_ and thus unusable in most environments. > > > I have to disagree. No where in the jira or the design it is explicitly > stated that > the old short circuit functionality is being removed. My assumption has been > that it will not be removed. I've tried this avenue in the past on other insecurities which were fixed. Sorry if you were depending on insecure behavior. The project should move on and not have 3+ ways of implementing the same thing. > As regards "officially supported", we have been doing > windows development for > more than a year. In fact branch-1-win is being used by a lot users. Given > merging it to branch-1 requires first making it available in trunk, we have > been doing > a lot of work in branch-trunk-win. It is almost ready to be merged as well. > > I am -1 on removing existing short circuit until an alternative short > circuit similar > to HDFS-347 on all the platforms. Great -- are you committed to building this equivalent feature for Windows, then? On what timeline? From my viewpoint, Windows isn't a supported platform *right now*, so vetoing based on it seems meritless. BTW, the posix_fadvise based readahead is an important optimization for many workloads, too, but from what I can tell looking at the Windows branch, it doesn't support it. There are other places in the Windows branch where performance is going to be worse - eg disabling the pipelined crc32c implementation will be a 2-3x hit on that code path. No one has voted to merge Windows support, and if merging Windows support means that, from now on, every _optimization_ must work on Windows, I don't think I will be able to vote +1. The vast majority of the community is _not_ running Windows, and I don't want to block progress on the small number of developers who know how to program against that platform. If that's the axe hanging over our head with the Windows branch, then I'm all for saying "good, keep it on a branch and don't merge it to trunk". I was hoping we could all work together a bit better here... contentious merge votes like this just cause cases where different distros diverge by merging different branches way ahead of the upstream (eg yours with windows support, ours with 347, etc) -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeSuresh Srinivas 2013-02-21, 00:04
>
> > > > I have to disagree. No where in the jira or the design it is explicitly > > stated that > > the old short circuit functionality is being removed. My assumption has > been > > that it will not be removed. > > I've tried this avenue in the past on other insecurities which were > fixed. Sorry if you were depending on insecure behavior. The project > should move on and not have 3+ ways of implementing the same thing. > Todd, can you please show in the jira or design document where explicitly it is stated that old short circuit will be removed. I agree that we need to move past the old short circuit, but not now. Great -- are you committed to building this equivalent feature for > Windows, then? On what timeline? Yes. I plan to get the equivalent functionality in a couple of months, once the existing work on snapshots and some HA related cleanup, RPC cleanup completes. I plan to make it available in a dot release after 2.x goes GA. > From my viewpoint, Windows isn't a > supported platform *right now*, so vetoing based on it seems > meritless. > I disagree. If I had timed branch-trunk-win merge before HDFS-347, then would you consider windows a supported platform? You know that more than a years worth of work has gone into windows support, all in the public. > > BTW, the posix_fadvise based readahead is an important optimization > for many workloads, too, but from what I can tell looking at the > Windows branch, it doesn't support it. There are other places in the > Windows branch where performance is going to be worse - eg disabling > the pipelined crc32c implementation will be a 2-3x hit on that code > path. > We will add similar optimizations as well. You are not seeing the subtle difference. You are removing a functionality that is already working on windows. These optimizations are not available on windows currently, given in Hadoop we have been making optimizations for *nix for a long time. They will become available in due course of time. We could certainly discuss removing this functionality then. I also offer to support two code paths until then. > > No one has voted to merge Windows support, and if merging Windows > support means that, from now on, every _optimization_ must work on > Windows, I don't think I will be able to vote +1. The vast majority of > the community is _not_ running Windows, and I don't want to block > progress on the small number of developers who know how to program > against that platform. > I have only sent heads up email about coming merge. It is not an email asking for vote. The vast majority of the community is not running windows because it was not supported well. That is changing with the work that is going on in windows branch. Again you seem to be misunderstanding my comment. I am not asking for *every_optimization_must work on windows*. You are free to optimize for a platform as you want. All I am asking for not removing an optimization that is already available on windows. > If that's the axe hanging over our head with the Windows branch, then > I'm all for saying "good, keep it on a branch and don't merge it to > trunk". > I was hoping we could all work together a bit better here... > contentious merge votes like this just cause cases where different > distros diverge by merging different branches way ahead of the > upstream (eg yours with windows support, ours with 347, etc) HDFS-347 does not clearly state old short circuit will be removed any where in the jira or design. If this was made clear in the jira, this discussion would have happened much earlier than now. You seem to be taking the comments I am making the wrong way. I am supportive of this work. In fact as you see some of us have spent time testing this work and have reviewed the code. Regards, Suresh -- http://hortonworks.com/download/
-
Re: VOTE: HDFS-347 mergeTodd Lipcon 2013-02-21, 00:12
On Wed, Feb 20, 2013 at 4:04 PM, Suresh Srinivas <[EMAIL PROTECTED]> wrote:
> > HDFS-347 does not clearly state old short circuit will be removed any where > in the jira or design. If this was made clear in the jira, this discussion > would > have happened much earlier than now. > > You seem to be taking the comments I am making the wrong way. I am > supportive of this work. In fact as you see some of us have spent time > testing this work and have reviewed the code. The patches even going back as far as last September have all removed the old code path. I sort of assumed that, if you are taking time to review the patches, you would have noticed this... additionally, Colin's comments on the JIRA said as much... eg: "The old RPC is now deprecated and will always throw an AccessControlException, so that older clients will fall back to remote reads." "BlockReaderLocal: simpler implementation that uses raw FileChannel objects. We don't need to cache anything, or make RPCs to the DataNode." from his 10/1/2012 patch upload. So, any patch you might have looked at since then would have clearly removed the old code path. -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeSuresh Srinivas 2013-02-21, 00:28
>
> The patches even going back as far as last September have all removed > the old code path. I sort of assumed that, if you are taking time to > review the patches, you would have noticed this... additionally, > Colin's comments on the JIRA said as much... eg: > Todd, we have different ways of reviewing and tracking work that happens in a branch. In fact if you look at snapshots branch, a functionality was coded and optimized/rewritten multiple times. My approach is to review final consolidated patch. That is the reason why I have been keeping HADOOP-8562 updated, so reviewers can see the latest patch. > "The old RPC is now deprecated and will always throw an > AccessControlException, so that older clients will fall back to remote > reads." > "BlockReaderLocal: simpler implementation that uses raw FileChannel > objects. We don't need to cache anything, or make RPCs to the > DataNode." Sorry this is not explicit. HDFS-347 has been around for a long time. Stating "HDFS-2246 short circuit mechanism will be removed from the client" is what I call explicitly stating it. To summarize I want to retain old short circuit and will spend time maintaining it for the following reasons: - It is a functionality that is already available and it supports all all the platforms. - I know great deal of testing has gone into HDFS-347. But if any issue is discovered, it is good to have old functionality as a fallback option. Please consider these thoughts and do not take it as impeding the work. I am actually trying to help here. Regards, Suresh -- http://hortonworks.com/download/
-
Re: VOTE: HDFS-347 mergeChris Douglas 2013-02-21, 00:29
The throughput on this thread is too high.
Nicholas/Suresh: do either of you disagree with the general approach in HDFS-347? Holding an inevitable branch open causes a lot of pain (Suresh, you endured much of that personally with security, which had a lot of follow-on work). While it makes sense to block HDFS-347 from 2.x before all concerns are addressed, are most objections so fundamental that the merge to trunk should be prevented (except for HDFS-2246)? Is there related development that will be impacted? Colin won't disappear after this is committed, so if the answers to these questions are "no", then let's move forward. Todd: Yes, the idea that the project only allows optimizations that work on all platforms is idiotic, which is why nobody has suggested it. Given that HDFS-347 is a strictly better approach, once committed, there will be ample motivation to add support for other OSes and remove HDFS-2246 entirely. Nobody is confused about this. There's ample precedent for retaining obscure, clumsy features as a temporary stop-gap (e.g., service plugins, opaque blobs of bytes in Tasks, configurable combiner semantics). What's the virtue of insisting on removing this? Unless there was a lot of follow-on work, HDFS-2246 doesn't look like a lot of code... -C On Wed, Feb 20, 2013 at 3:40 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > On Wed, Feb 20, 2013 at 3:31 PM, Suresh Srinivas <[EMAIL PROTECTED]> wrote: >>> Given that this is an optimization, and we have a ton of optimizations >>> which don't yet run on Windows, I don't think that should be >>> considered. Additionally, the Windows support has not yet been merged, >>> nor is it in any release, so this isn't a regression. >>> >> >> This is a critical functionality for HBase peformance and an optimization >> we consider >> very important to have. > > Too bad it doesn't work in any of the normal installations... none of > the packages for Hadoop would allow it to work, given that the data > directories will be owned by HDFS and not world readable, and > tasks/HBase would run as an "hbase" user, which wouldn't have direct > access to the block files. > >>> >>> I would be happy to review an addition to the HDFS-347 branch which >>> addresses this issue. But I don't think we should be maintaining two >>> codepaths for the sake of an optimization on a platform which is not >>> yet officially supported on trunk, especially when the old code path >>> is _insecure_ and thus unusable in most environments. >> >> >> I have to disagree. No where in the jira or the design it is explicitly >> stated that >> the old short circuit functionality is being removed. My assumption has been >> that it will not be removed. > > I've tried this avenue in the past on other insecurities which were > fixed. Sorry if you were depending on insecure behavior. The project > should move on and not have 3+ ways of implementing the same thing. > >> As regards "officially supported", we have been doing >> windows development for >> more than a year. In fact branch-1-win is being used by a lot users. Given >> merging it to branch-1 requires first making it available in trunk, we have >> been doing >> a lot of work in branch-trunk-win. It is almost ready to be merged as well. >> >> I am -1 on removing existing short circuit until an alternative short >> circuit similar >> to HDFS-347 on all the platforms. > > Great -- are you committed to building this equivalent feature for > Windows, then? On what timeline? From my viewpoint, Windows isn't a > supported platform *right now*, so vetoing based on it seems > meritless. > > BTW, the posix_fadvise based readahead is an important optimization > for many workloads, too, but from what I can tell looking at the > Windows branch, it doesn't support it. There are other places in the > Windows branch where performance is going to be worse - eg disabling > the pipelined crc32c implementation will be a 2-3x hit on that code > path. > > No one has voted to merge Windows support, and if merging Windows On Wed, Feb 20, 2013 at 3:40 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
-
Re: VOTE: HDFS-347 mergeSuresh Srinivas 2013-02-21, 00:47
On Wed, Feb 20, 2013 at 4:29 PM, Chris Douglas <[EMAIL PROTECTED]> wrote:
> The throughput on this thread is too high. > > Nicholas/Suresh: do either of you disagree with the general approach > in HDFS-347? Holding an inevitable branch open causes a lot of pain > (Suresh, you endured much of that personally with security, which had > a lot of follow-on work). While it makes sense to block HDFS-347 from > 2.x before all concerns are addressed, are most objections so > fundamental that the merge to trunk should be prevented (except for > HDFS-2246)? Is there related development that will be impacted? Colin > won't disappear after this is committed, so if the answers to these > questions are "no", then let's move forward. > Yes. I have already stated this earlier. "... if the changes are deemed to be trivial, we can do it post merge to trunk." > Todd: Yes, the idea that the project only allows optimizations that > work on all platforms is idiotic, which is why nobody has suggested > it. Given that HDFS-347 is a strictly better approach, once committed, > there will be ample motivation to add support for other OSes and > remove HDFS-2246 entirely. Nobody is confused about this. There's > ample precedent for retaining obscure, clumsy features as a temporary > stop-gap (e.g., service plugins, opaque blobs of bytes in Tasks, > configurable combiner semantics). What's the virtue of insisting on > removing this? Unless there was a lot of follow-on work, HDFS-2246 > doesn't look like a lot of code... -C > > On Wed, Feb 20, 2013 at 3:40 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > > On Wed, Feb 20, 2013 at 3:31 PM, Suresh Srinivas <[EMAIL PROTECTED]> > wrote: > >>> Given that this is an optimization, and we have a ton of optimizations > >>> which don't yet run on Windows, I don't think that should be > >>> considered. Additionally, the Windows support has not yet been merged, > >>> nor is it in any release, so this isn't a regression. > >>> > >> > >> This is a critical functionality for HBase peformance and an > optimization > >> we consider > >> very important to have. > > > > Too bad it doesn't work in any of the normal installations... none of > > the packages for Hadoop would allow it to work, given that the data > > directories will be owned by HDFS and not world readable, and > > tasks/HBase would run as an "hbase" user, which wouldn't have direct > > access to the block files. > > > >>> > >>> I would be happy to review an addition to the HDFS-347 branch which > >>> addresses this issue. But I don't think we should be maintaining two > >>> codepaths for the sake of an optimization on a platform which is not > >>> yet officially supported on trunk, especially when the old code path > >>> is _insecure_ and thus unusable in most environments. > >> > >> > >> I have to disagree. No where in the jira or the design it is explicitly > >> stated that > >> the old short circuit functionality is being removed. My assumption has > been > >> that it will not be removed. > > > > I've tried this avenue in the past on other insecurities which were > > fixed. Sorry if you were depending on insecure behavior. The project > > should move on and not have 3+ ways of implementing the same thing. > > > >> As regards "officially supported", we have been doing > >> windows development for > >> more than a year. In fact branch-1-win is being used by a lot users. > Given > >> merging it to branch-1 requires first making it available in trunk, we > have > >> been doing > >> a lot of work in branch-trunk-win. It is almost ready to be merged as > well. > >> > >> I am -1 on removing existing short circuit until an alternative short > >> circuit similar > >> to HDFS-347 on all the platforms. > > > > Great -- are you committed to building this equivalent feature for > > Windows, then? On what timeline? From my viewpoint, Windows isn't a > > supported platform *right now*, so vetoing based on it seems > > meritless. > > > > BTW, the posix_fadvise based readahead is an important optimization http://hortonworks.com/download/
-
Re: VOTE: HDFS-347 mergeAaron T. Myers 2013-02-21, 01:12
On Wed, Feb 20, 2013 at 4:29 PM, Chris Douglas <[EMAIL PROTECTED]> wrote:
> Given that HDFS-347 is a strictly better approach, once committed, > there will be ample motivation to add support for other OSes and > remove HDFS-2246 entirely. Nobody is confused about this. There's > ample precedent for retaining obscure, clumsy features as a temporary > stop-gap (e.g., service plugins, opaque blobs of bytes in Tasks, > configurable combiner semantics). What's the virtue of insisting on > removing this? Unless there was a lot of follow-on work, HDFS-2246 > doesn't look like a lot of code... > Though it's not a ton of code, I think that having to support a more complex fallback path (i.e. try the HDFS-347 method, then fall back to trying the HDFS-2246 method, then fall back to doing normal TCP reads to the local DN) will make the code quite a bit hairier for little added benefit. How about this proposal for a compromise: Given that the only substantive concerns with HDFS-347 seem to be about Windows support for local reads, for now we only merge this branch to trunk. Support for doing HDFS-2246 style local reads will be removed from trunk, but retained in branch-2 for now. Only once someone adds support for doing HDFS-347 style local reads which work on Windows will we consider merging HDFS-347 to branch-2. This should ensure that there's no feature regression on branch-2, but also means that we will not need to maintain the HDFS-2246 code path alongside the HDFS-347 code path indefinitely. -- Aaron T. Myers Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeAndrew Purtell 2013-02-21, 01:32
> Only once someone adds support for doing HDFS-347 style local reads which
work on Windows will we consider merging HDFS-347 to branch-2. There's no chance of having both HDFS-347 and HDFS-2246 style local reads coexisting in branch-2? It would be nice if HDFS-347 was in branch-2 sooner rather than later. On Wed, Feb 20, 2013 at 5:12 PM, Aaron T. Myers <[EMAIL PROTECTED]> wrote: > On Wed, Feb 20, 2013 at 4:29 PM, Chris Douglas <[EMAIL PROTECTED]> > wrote: > > > Given that HDFS-347 is a strictly better approach, once committed, > > there will be ample motivation to add support for other OSes and > > remove HDFS-2246 entirely. Nobody is confused about this. There's > > ample precedent for retaining obscure, clumsy features as a temporary > > stop-gap (e.g., service plugins, opaque blobs of bytes in Tasks, > > configurable combiner semantics). What's the virtue of insisting on > > removing this? Unless there was a lot of follow-on work, HDFS-2246 > > doesn't look like a lot of code... > > > > Though it's not a ton of code, I think that having to support a more > complex fallback path (i.e. try the HDFS-347 method, then fall back to > trying the HDFS-2246 method, then fall back to doing normal TCP reads to > the local DN) will make the code quite a bit hairier for little added > benefit. > > How about this proposal for a compromise: > > Given that the only substantive concerns with HDFS-347 seem to be about > Windows support for local reads, for now we only merge this branch to > trunk. Support for doing HDFS-2246 style local reads will be removed from > trunk, but retained in branch-2 for now. Only once someone adds support for > doing HDFS-347 style local reads which work on Windows will we consider > merging HDFS-347 to branch-2. This should ensure that there's no feature > regression on branch-2, but also means that we will not need to maintain > the HDFS-2246 code path alongside the HDFS-347 code path indefinitely. > > -- > Aaron T. Myers > Software Engineer, Cloudera > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: VOTE: HDFS-347 mergeSuresh Srinivas 2013-02-21, 15:40
On Wed, Feb 20, 2013 at 5:12 PM, Aaron T. Myers <[EMAIL PROTECTED]> wrote:
> On Wed, Feb 20, 2013 at 4:29 PM, Chris Douglas <[EMAIL PROTECTED]> > wrote: > > > Given that HDFS-347 is a strictly better approach, once committed, > > there will be ample motivation to add support for other OSes and > > remove HDFS-2246 entirely. Nobody is confused about this. There's > > ample precedent for retaining obscure, clumsy features as a temporary > > stop-gap (e.g., service plugins, opaque blobs of bytes in Tasks, > > configurable combiner semantics). What's the virtue of insisting on > > removing this? Unless there was a lot of follow-on work, HDFS-2246 > > doesn't look like a lot of code... > > > > Though it's not a ton of code, I think that having to support a more > complex fallback path (i.e. try the HDFS-347 method, then fall back to > trying the HDFS-2246 method, then fall back to doing normal TCP reads to > the local DN) will make the code quite a bit hairier for little added > benefit. > This could be simpler. If HDFS-347 is configured, then use that method. If it does not work, then fall back to normal TCP reads. If HDFS-2246 method is configured the current functionality works as is. HDFS-347 overrides HDFS-2246 if both are configured. That means when configuring HDFS-2246, the user also needs to turn off HDFS-347. -- http://hortonworks.com/download/
-
Re: VOTE: HDFS-347 mergeChris Douglas 2013-02-21, 21:24
On Wed, Feb 20, 2013 at 5:12 PM, Aaron T. Myers <[EMAIL PROTECTED]> wrote:
> Given that the only substantive concerns with HDFS-347 seem to be about > Windows support for local reads, for now we only merge this branch to > trunk. Support for doing HDFS-2246 style local reads will be removed from > trunk, but retained in branch-2 for now. Only once someone adds support for > doing HDFS-347 style local reads which work on Windows will we consider > merging HDFS-347 to branch-2. This should ensure that there's no feature > regression on branch-2, but also means that we will not need to maintain > the HDFS-2246 code path alongside the HDFS-347 code path indefinitely. This seems reasonable, though retaining HDFS-2246 in branch-2 could be a workaround if a Windows port of HDFS-347 is not forthcoming. -C
-
Re: VOTE: HDFS-347 mergeTsz Wo Sze 2013-02-21, 22:15
On Wed, Feb 20, 2013 at 5:12 PM, Aaron T. Myers <[EMAIL PROTECTED]> wrote:
> Given that the only substantive concerns with HDFS-347 seem to be about > Windows support for local reads, for now we only merge this branch to trunk. Another substantive concern is that HDFS-347 is not as well tested as HDFS-2246. So, we should keep HDFS-2246 around for sometime and remove it later. Is this the usual practice? HDFS-347 and HDFS-2246 are not mutually exclusive. I don't understand why we must remove HDFS-2246 immediately but not allow them to coexist. Tsz-Wo
-
Re: VOTE: HDFS-347 mergeColin McCabe 2013-02-22, 19:13
On Thu, Feb 21, 2013 at 1:24 PM, Chris Douglas <[EMAIL PROTECTED]> wrote:
> On Wed, Feb 20, 2013 at 5:12 PM, Aaron T. Myers <[EMAIL PROTECTED]> wrote: >> Given that the only substantive concerns with HDFS-347 seem to be about >> Windows support for local reads, for now we only merge this branch to >> trunk. Support for doing HDFS-2246 style local reads will be removed from >> trunk, but retained in branch-2 for now. Only once someone adds support for >> doing HDFS-347 style local reads which work on Windows will we consider >> merging HDFS-347 to branch-2. This should ensure that there's no feature >> regression on branch-2, but also means that we will not need to maintain >> the HDFS-2246 code path alongside the HDFS-347 code path indefinitely. > > This seems reasonable, though retaining HDFS-2246 in branch-2 could be > a workaround if a Windows port of HDFS-347 is not forthcoming. -C This seems like a reasonable solution to me. To implement this on Windows, you would use the DuplicateHandle or WSADuplicateSocket APIs on that platform. I think the longer we delay the merge, the less time we will actually have to implement secure short-circuit on Windows. So we should not delay it any longer. best, Colin
-
Re: VOTE: HDFS-347 mergeEli Collins 2013-02-22, 21:55
On Thu, Feb 21, 2013 at 1:24 PM, Chris Douglas <[EMAIL PROTECTED]> wrote:
> On Wed, Feb 20, 2013 at 5:12 PM, Aaron T. Myers <[EMAIL PROTECTED]> wrote: >> Given that the only substantive concerns with HDFS-347 seem to be about >> Windows support for local reads, for now we only merge this branch to >> trunk. Support for doing HDFS-2246 style local reads will be removed from >> trunk, but retained in branch-2 for now. Only once someone adds support for >> doing HDFS-347 style local reads which work on Windows will we consider >> merging HDFS-347 to branch-2. This should ensure that there's no feature >> regression on branch-2, but also means that we will not need to maintain >> the HDFS-2246 code path alongside the HDFS-347 code path indefinitely. > > This seems reasonable, though retaining HDFS-2246 in branch-2 could be > a workaround if a Windows port of HDFS-347 is not forthcoming. -C This seems reasonable to me as well. Nicholas, Suresh - how does this sound to you?
-
Re: VOTE: HDFS-347 mergeTsz Wo Sze 2013-02-23, 02:32
Eli, please see my earlier response below.
On Wed, Feb 20, 2013 at 5:12 PM, Aaron T. Myers <[EMAIL PROTECTED]> wrote: > Given that the only substantive concerns with HDFS-347 seem to be about > Windows support for local reads, for now we only merge this branch to trunk. Another substantive concern is that HDFS-347 is not as well tested as HDFS-2246. So, we should keep HDFS-2246 around for sometime and remove it later. Is this the usual practice? HDFS-347 and HDFS-2246 are not mutually exclusive. I don't understand why we must remove HDFS-2246 immediately but not allow them to coexist. Tsz-Wo ________________________________ From: Eli Collins <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, February 22, 2013 1:55 PM Subject: Re: VOTE: HDFS-347 merge On Thu, Feb 21, 2013 at 1:24 PM, Chris Douglas <[EMAIL PROTECTED]> wrote: > On Wed, Feb 20, 2013 at 5:12 PM, Aaron T. Myers <[EMAIL PROTECTED]> wrote: >> Given that the only substantive concerns with HDFS-347 seem to be about >> Windows support for local reads, for now we only merge this branch to >> trunk. Support for doing HDFS-2246 style local reads will be removed from >> trunk, but retained in branch-2 for now. Only once someone adds support for >> doing HDFS-347 style local reads which work on Windows will we consider >> merging HDFS-347 to branch-2. This should ensure that there's no feature >> regression on branch-2, but also means that we will not need to maintain >> the HDFS-2246 code path alongside the HDFS-347 code path indefinitely. > > This seems reasonable, though retaining HDFS-2246 in branch-2 could be > a workaround if a Windows port of HDFS-347 is not forthcoming. -C This seems reasonable to me as well. Nicholas, Suresh - how does this sound to you?
-
Re: VOTE: HDFS-347 mergeAaron T. Myers 2013-02-23, 02:40
On Fri, Feb 22, 2013 at 6:32 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote:
> Another > substantive concern is that HDFS-347 is not as well tested as > HDFS-2246. So, we should keep HDFS-2246 around for sometime and remove > it later. Is this the usual practice? > I'm proposing we do just that - keep HDFS-2246 around in branch-2 to let HDFS-347 soak a bit on trunk and then remove HDFS-2246 from branch-2 once we're confident in HDFS-347 and trunk adds Windows support. As Colin pointed out, this VOTE has always been about only merging this branch to trunk. -- Aaron T. Myers Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeTsz Wo Sze 2013-02-24, 00:23
I still do not see a valid reason to remove HDFS-2246 immediately. Some users may have insecure clusters and they don't want to change their configuration.
BTW, is Unix Domain Socket supported by all Unix-like systems? Does anyone can confirm that or show some counterexamples? Tsz-Wo ________________________________ From: Aaron T. Myers <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Tsz Wo Sze <[EMAIL PROTECTED]> Sent: Friday, February 22, 2013 6:40 PM Subject: Re: VOTE: HDFS-347 merge On Fri, Feb 22, 2013 at 6:32 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: > Another > substantive concern is that HDFS-347 is not as well tested as > HDFS-2246. So, we should keep HDFS-2246 around for sometime and remove > it later. Is this the usual practice? > I'm proposing we do just that - keep HDFS-2246 around in branch-2 to let HDFS-347 soak a bit on trunk and then remove HDFS-2246 from branch-2 once we're confident in HDFS-347 and trunk adds Windows support. As Colin pointed out, this VOTE has always been about only merging this branch to trunk. -- Aaron T. Myers Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeEli Collins 2013-02-25, 18:24
On Sat, Feb 23, 2013 at 4:23 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote:
> I still do not see a valid reason to remove HDFS-2246 immediately. Some users may have insecure clusters and they don't want to change their configuration. Because it doesn't make sense to support multiple mechanisms for the same thing. 2246 was always intended to be a *short term solution* util 347 was completed, eg see Sanjay's first comment on 2246: "A shortcut has been proposed where the client access the hdfs file blocks directly... This is non-invasive and is a good short term solution till HDFS-347 is completed." Thanks, Eli
-
Re: VOTE: HDFS-347 mergeColin McCabe 2013-02-25, 18:31
On Sat, Feb 23, 2013 at 4:23 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote:
> I still do not see a valid reason to remove HDFS-2246 immediately. Some users may have insecure clusters and they don't want to change their configuration. > > BTW, is Unix Domain Socket supported by all Unix-like systems? Does anyone can confirm that or show some counterexamples? UNIX domain sockets are supported on MacOS, FreeBSD, OpenBSD, NetBSD, Linux, etc etc. Their behavior is standardized by POSIX and implemented by all POSIX OSes... although there are some OS-specific behaviors which we avoid. Colin > > > Tsz-Wo > > > > ________________________________ > From: Aaron T. Myers <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Tsz Wo Sze <[EMAIL PROTECTED]> > Sent: Friday, February 22, 2013 6:40 PM > Subject: Re: VOTE: HDFS-347 merge > > On Fri, Feb 22, 2013 at 6:32 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: > >> Another >> substantive concern is that HDFS-347 is not as well tested as >> HDFS-2246. So, we should keep HDFS-2246 around for sometime and remove >> it later. Is this the usual practice? >> > > I'm proposing we do just that - keep HDFS-2246 around in branch-2 to let > HDFS-347 soak a bit on trunk and then remove HDFS-2246 from branch-2 once > we're confident in HDFS-347 and trunk adds Windows support. As Colin > pointed out, this VOTE has always been about only merging this branch to > trunk. > > > -- > Aaron T. Myers > Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeTsz Wo Sze 2013-02-25, 20:50
I agree that HDFS-2246 is a short term solution and we should not keep it there forever. However, we still need a transition period to replace an old mechanism by a new one. No?
Tsz-Wo ________________________________ From: Eli Collins <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Tsz Wo Sze <[EMAIL PROTECTED]> Sent: Monday, February 25, 2013 10:24 AM Subject: Re: VOTE: HDFS-347 merge On Sat, Feb 23, 2013 at 4:23 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: > I still do not see a valid reason to remove HDFS-2246 immediately. Some users may have insecure clusters and they don't want to change their configuration. Because it doesn't make sense to support multiple mechanisms for the same thing. 2246 was always intended to be a *short term solution* util 347 was completed, eg see Sanjay's first comment on 2246: "A shortcut has been proposed where the client access the hdfs file blocks directly... This is non-invasive and is a good short term solution till HDFS-347 is completed." Thanks, Eli
-
Re: VOTE: HDFS-347 mergeEli Collins 2013-02-25, 21:16
I think we need a transition period when any kinks are worked out of
347 but I don't think we need one alpha/beta release where both mechanisms are supported (because 2246 was just a short term solution rather than a long term commitment). Ideally we'd get 347 in branch-2 for 2.0.4-beta and have that release to address issues that come up to fix for GA. Cloudera is actively testing 347 and parts of the community are eager to pick it up so I think that would work out timing wise. Reasonable? On Mon, Feb 25, 2013 at 12:50 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: > I agree that HDFS-2246 is a short term solution and we should not keep it there forever. However, we still need a transition period to replace an old mechanism by a new one. No? > > Tsz-Wo > > > > > ________________________________ > From: Eli Collins <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Tsz Wo Sze <[EMAIL PROTECTED]> > Sent: Monday, February 25, 2013 10:24 AM > Subject: Re: VOTE: HDFS-347 merge > > On Sat, Feb 23, 2013 at 4:23 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >> I still do not see a valid reason to remove HDFS-2246 immediately. Some users may have insecure clusters and they don't want to change their configuration. > > Because it doesn't make sense to support multiple mechanisms for the > same thing. > > 2246 was always intended to be a *short term solution* util 347 was > completed, eg see Sanjay's first comment on 2246: "A shortcut has > been proposed where the client access the hdfs file blocks directly... > This is non-invasive and is a good short term solution till HDFS-347 > is completed." > > Thanks, > Eli
-
Re: VOTE: HDFS-347 mergeChris Douglas 2013-02-25, 21:50
On Mon, Feb 25, 2013 at 1:16 PM, Eli Collins <[EMAIL PROTECTED]> wrote:
> I think we need a transition period when any kinks are worked out of > 347 but I don't think we need one alpha/beta release where both > mechanisms are supported (because 2246 was just a short term solution > rather than a long term commitment). Ideally we'd get 347 in branch-2 > for 2.0.4-beta and have that release to address issues that come up to > fix for GA. Cloudera is actively testing 347 and parts of the > community are eager to pick it up so I think that would work out > timing wise. Reasonable? ATM's suggestion of removing HDFS-2246 in trunk, but not branch-2, is a rational compromise: it allows some period for others to adapt, but not an indefinite one. It's not clear what you're proposing, if anything. Nicholas/Suresh: have you had a chance to review HDFS-347, yet? -C > On Mon, Feb 25, 2013 at 12:50 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >> I agree that HDFS-2246 is a short term solution and we should not keep it there forever. However, we still need a transition period to replace an old mechanism by a new one. No? >> >> Tsz-Wo >> >> >> >> >> ________________________________ >> From: Eli Collins <[EMAIL PROTECTED]> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Tsz Wo Sze <[EMAIL PROTECTED]> >> Sent: Monday, February 25, 2013 10:24 AM >> Subject: Re: VOTE: HDFS-347 merge >> >> On Sat, Feb 23, 2013 at 4:23 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >>> I still do not see a valid reason to remove HDFS-2246 immediately. Some users may have insecure clusters and they don't want to change their configuration. >> >> Because it doesn't make sense to support multiple mechanisms for the >> same thing. >> >> 2246 was always intended to be a *short term solution* util 347 was >> completed, eg see Sanjay's first comment on 2246: "A shortcut has >> been proposed where the client access the hdfs file blocks directly... >> This is non-invasive and is a good short term solution till HDFS-347 >> is completed." >> >> Thanks, >> Eli
-
Re: VOTE: HDFS-347 mergeEli Collins 2013-02-25, 22:01
On Mon, Feb 25, 2013 at 1:50 PM, Chris Douglas <[EMAIL PROTECTED]> wrote:
> On Mon, Feb 25, 2013 at 1:16 PM, Eli Collins <[EMAIL PROTECTED]> wrote: >> I think we need a transition period when any kinks are worked out of >> 347 but I don't think we need one alpha/beta release where both >> mechanisms are supported (because 2246 was just a short term solution >> rather than a long term commitment). Ideally we'd get 347 in branch-2 >> for 2.0.4-beta and have that release to address issues that come up to >> fix for GA. Cloudera is actively testing 347 and parts of the >> community are eager to pick it up so I think that would work out >> timing wise. Reasonable? > > ATM's suggestion of removing HDFS-2246 in trunk, but not branch-2, is > a rational compromise: it allows some period for others to adapt, but > not an indefinite one. It's not clear what you're proposing, if > anything. I'm not proposing anything new, Nicholas said he had some concerns with ATM's proposal and we're discussing them. Specifically, ATM's proposal does not allow for a single release that contains both 347 and 2246 (he proposes removing 2246 from branch-2 when 347 is ready). I think Nicholas was saying that we should not remove 2246 immediately for the sake of a transition period (which I interpret to mean a release that supports both), I responded saying that the transition that we'd have in ATM's proposal (people adjust on trunk and there's some branch-2 release that flips over) is sufficient given that 2246 was intended as a short term thing. Curious if Nicholas and others think that's reasonable. Thanks, Eli > > Nicholas/Suresh: have you had a chance to review HDFS-347, yet? -C > >> On Mon, Feb 25, 2013 at 12:50 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >>> I agree that HDFS-2246 is a short term solution and we should not keep it there forever. However, we still need a transition period to replace an old mechanism by a new one. No? >>> >>> Tsz-Wo >>> >>> >>> >>> >>> ________________________________ >>> From: Eli Collins <[EMAIL PROTECTED]> >>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Tsz Wo Sze <[EMAIL PROTECTED]> >>> Sent: Monday, February 25, 2013 10:24 AM >>> Subject: Re: VOTE: HDFS-347 merge >>> >>> On Sat, Feb 23, 2013 at 4:23 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >>>> I still do not see a valid reason to remove HDFS-2246 immediately. Some users may have insecure clusters and they don't want to change their configuration. >>> >>> Because it doesn't make sense to support multiple mechanisms for the >>> same thing. >>> >>> 2246 was always intended to be a *short term solution* util 347 was >>> completed, eg see Sanjay's first comment on 2246: "A shortcut has >>> been proposed where the client access the hdfs file blocks directly... >>> This is non-invasive and is a good short term solution till HDFS-347 >>> is completed." >>> >>> Thanks, >>> Eli
-
Re: VOTE: HDFS-347 mergeSuresh Srinivas 2013-02-26, 00:09
ATM's suggestion of removing HDFS-2246 in trunk, but not branch-2, is
> a rational compromise: it allows some period for others to adapt, but > not an indefinite one. It's not clear what you're proposing, if > anything. > I am not sure why a release that supports both these is such a bad idea. As Nicholas has stated these mechanisms are not mutually exclusive. I am -1 on removing HDFS-2246. That should be done in a separate jira and not as a part of HDFS-347. If the concern is the complexity, I have already suggested some choices on how to simplify it. If maintaining the code is a concern, I have also offered to support and maintain it.
-
Re: VOTE: HDFS-347 mergeEli Collins 2013-02-26, 00:39
On Mon, Feb 25, 2013 at 4:09 PM, Suresh Srinivas <[EMAIL PROTECTED]> wrote:
> ATM's suggestion of removing HDFS-2246 in trunk, but not branch-2, is >> a rational compromise: it allows some period for others to adapt, but >> not an indefinite one. It's not clear what you're proposing, if >> anything. >> > > > > I am not sure why a release that supports both these is such a bad idea. > As Nicholas has stated these mechanisms are not mutually exclusive. > I am -1 on removing HDFS-2246. That should be done in a separate jira > and not as a part of HDFS-347. If the concern is the complexity, I have > already suggested some choices on how to simplify it. If maintaining the > code > is a concern, I have also offered to support and maintain it. There's no reason to maintain multiple implementations of the same feature, that's why per the 2246 jira it was proposed as a "good short term solution till HDFS-347 is completed". Why is ATM's compromise unacceptable?
-
Re: VOTE: HDFS-347 mergeSuresh Srinivas 2013-02-26, 17:33
>
> > There's no reason to maintain multiple implementations of the same > feature, that's why per the 2246 jira it was proposed as a "good short > term solution till HDFS-347 is completed". Why is ATM's compromise > unacceptable? > We have already discussed this. Here is the recap: HDFS-347 does not support all the platforms. HDFS-2246 does. So removing HDFS-2246 does not make sense unless HDFS-347 supports all the platforms. I am not arguing we should retain HDFS-2246 forever. I do not currently have bandwidth to add HDFS-347 windows equivalent functionality. When I or someone else adds support for that we can discuss removing HDFS-2246. Until then I can support HDFS-2246 mechanism. I have also proposed how to make the current patch simpler where both the features can live together. I will change my vote to +1 in following two cases. 1. HDFS-347 supports all the other platforms HDFS-2246 supported and hence truly becomes replacement. 2. Do not remove HDFS-2246 support. BTW I think we should switch to some other thread given this merge vote is already past 7 days. -- http://hortonworks.com/download/
-
Re: VOTE: HDFS-347 mergeEli Collins 2013-02-26, 19:24
On Tue, Feb 26, 2013 at 9:33 AM, Suresh Srinivas <[EMAIL PROTECTED]> wrote:
>> >> >> There's no reason to maintain multiple implementations of the same >> feature, that's why per the 2246 jira it was proposed as a "good short >> term solution till HDFS-347 is completed". Why is ATM's compromise >> unacceptable? >> > > We have already discussed this. > > Here is the recap: > HDFS-347 does not support all the platforms. HDFS-2246 does. > So removing HDFS-2246 does not make sense unless HDFS-347 > supports all the platforms. > > I am not arguing we should retain HDFS-2246 forever. I do not currently > have bandwidth to add HDFS-347 windows equivalent functionality. > When I or someone else adds support for that we can discuss removing > HDFS-2246. Until then I can support HDFS-2246 mechanism. I have > also proposed how to make the current patch simpler where both the > features can live together. > > I will change my vote to +1 in following two cases. > 1. HDFS-347 supports all the other platforms HDFS-2246 supported and > hence truly becomes replacement. I assume you mean in trunk? Given that ATM's proposal is to only remove HDFS-2246 from branch-2 once (a) we're confident in HDFS-347 and (b) adds Windows support, and we won't be releasing from trunk any time soon - from a user perspective - HDFS-2246 will only be replaced with HDFS-347 until it supports Windows. Ie ATM's compromise appears to satisfy your requirement.
-
Re: VOTE: HDFS-347 mergeSuresh Srinivas 2013-02-26, 19:35
>
> > I assume you mean in trunk? Given that ATM's proposal is to only > remove HDFS-2246 from branch-2 once (a) we're confident in HDFS-347 > and (b) adds Windows support, and we won't be releasing from trunk any > time soon - from a user perspective - HDFS-2246 will only be replaced > with HDFS-347 until it supports Windows. Ie ATM's compromise appears > to satisfy your requirement. > This means, HDFS-347 becomes available in branch-2 only when the corresponding windows work completes. Given that there is interest in HDFS-347 (see Andrew's email), not removing HDFS-2246 will make it available in an Apache release sooner. I am not sure why removing HDFS-2246 is so important in HDFS-347? Is it because you need to spend bunch of time to rework the code to add it back?
-
RE: VOTE: HDFS-347 mergeBikas Saha 2013-02-26, 21:47
Hi,
In my opinion, this feature of short circuit reads (HDFS-347 or HDFS-2246) is not a desirable feature for HDFS. We should be working towards removing this feature instead of enhancing it and making it popular. Maybe short-circuit reads were something that HBase needed for performance at a point in time when HDFS performance was slow. But after all the improvements that have been made, is it still unacceptably slow to read from HDFS? Is there more good engineering that we can do to close that gap? Close it for all HDFS users and not just the ones who use short-circuit reads? Which brings me to the question - Who is the target audience for this feature? From what I see, anyone who potentially wants to use it =everyone. Now if everyone starts using short circuit reads what happens to the performance problem that we are trying to solve? Will performance still be better then? This is especially important in the context of YARN where we don't control the apps that run on the shared grid. What problem are we trying to solve here? If we want better HDFS performance and QOS for services then we want to give as much control over the disk to HDFS rather than take it away. Short circuit reads leave a gaping hole towards that end and making short circuit reads better and easier to use makes that hole larger. I am sorry for replying late and also because my response might be missing historical perspectives that I am not aware of. Bikas -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Colin McCabe Sent: Sunday, February 17, 2013 1:49 PM To: [EMAIL PROTECTED] Subject: VOTE: HDFS-347 merge Hi all, I would like to merge the HDFS-347 branch back to trunk. It's been under intensive review and testing for several months. The branch adds a lot of new unit tests, and passes Jenkins as of 2/15 [1] We have tested HDFS-347 with both random and sequential workloads. The short-circuit case is substantially faster [2], and overall performance looks very good. This is especially encouraging given that the initial goal of this work was to make security compatible with short-circuit local reads, rather than to optimize the short-circuit code path. We've also stress-tested HDFS-347 on a number of clusters. This iniial VOTE is to merge only into trunk. Just as we have done with our other recent merges, we will consider merging into branch-2 after the code has been in trunk for few weeks. Please cast your vote by EOD Sunday 2/24. best, Colin McCabe [1] https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&p age=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comme nt-13579704 [2] https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&p age=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comme nt-13551755
-
Re: VOTE: HDFS-347 mergeEli Collins 2013-02-26, 21:51
On Tue, Feb 26, 2013 at 11:35 AM, Suresh Srinivas
<[EMAIL PROTECTED]> wrote: >> >> >> I assume you mean in trunk? Given that ATM's proposal is to only >> remove HDFS-2246 from branch-2 once (a) we're confident in HDFS-347 >> and (b) adds Windows support, and we won't be releasing from trunk any >> time soon - from a user perspective - HDFS-2246 will only be replaced >> with HDFS-347 until it supports Windows. Ie ATM's compromise appears >> to satisfy your requirement. >> > > This means, HDFS-347 becomes available in branch-2 only when the > corresponding windows work completes. Given that there is interest > in HDFS-347 (see Andrew's email), not removing HDFS-2246 will make > it available in an Apache release sooner. > > I am not sure why removing HDFS-2246 is so important in HDFS-347? Is it > because you need to spend bunch of time to rework the code to add it back? Yes, this would require updating all the code to make the fallback paths to work properly and re-testing. Given that all the patches posted to 347 have removed 2246 for a while now (this is not a new approach, Colin has been giving heads up months now about the patches and the merge), and 2246 was supposed to be a short-term fix (2246 was only supported on the condition that 347 was a wholesale replacement), it doesn't seem right to hold up 347 up for Windows support given that Windows support has not been merged to trunk yet, is not in any Apache release, etc. Personally I don't like establishing the precedent here that we can hold up a merge due to requirements from an unmerged branch. We'd all like to see 347 show up in branch-2 soon, and the original proposal to merge 347 accomplishes that, it's one of the downsides of the compromise for Windows. Thanks, Eli
-
Re: VOTE: HDFS-347 mergeTodd Lipcon 2013-02-26, 22:07
Hi Bikas,
I completely agree with you in principle -- short circuit reads end up ceding control of the data path from the DataNode to the user applications. This has a few disadvantages which you've mentioned, and have been brought up in the JIRA as well: particularly QoS, metrics, the flexibility to change our data layout on disk in the future, etc. However, the performance advantages of this approach are quite stark when the data sets have been cached in the OS buffer cache. For example, using a low-overhead client like Impala executing a simple table scan query, we've seen a 2x or more improvement in overall response time using short-circuit reads versus localhost TCP. The overhead comes primarily from the kernel layers, not from our own code -- eg localhost TCP connections still perform packet segmentation, enforce multiple buffer copies to and from kernel space, incur several syscalls, etc. A better implemented datanode, and perhaps doing transfer over domain sockets might close the gap a bit, but based on all of my benchmarks, it will still be ~50% slower than short circuit. If you look at the history of HDFS-347, I actually asked Colin to implement and experiment with a non-short-circuit path over domain sockets, under the assumption that they may be more efficient than loopback TCP sockets. The results weren't particularly encouraging, though it may still be enabled for anyone who wants to experiment with optimizing it further. There are also some improvements coming down the road in the Linux kernel (in particular "TCP friends") which can eliminate some of the TCP stack overhead for loopback connections, but unfortunately they're several years off for those of us deploying on mainstream distros. Most of the above is in reference to sequential throughput. Random IO performance is even more drastically effected - the benchmarks I posted on HDFS-347 show a 3-4x improvement in some workloads when the data is in the buffer cache. As the RAM capacities of our machines continue to increase, and as solid state storage becomes more cost effective, more and more random reads fall into this category where they're not bound by the hardware, but rather bound by our software overhead. Given all of the above, I think the performance benefits of short circuit read outweigh the disadvantages. Given that it is entirely an implementation optimization, and not an API, we can always re-evaluate in future versions, if either someone figures out a way to get a non-short-circuit implementation to comparable performance, or if the kernel guys catch up and implement TCP friends and other features which close the gap. Colin has also been careful to build in capability in the API for the datanode to reject a short circuit request, causing a client to seamlessly fall back, based on a version number. This would allow us to change the underlying format on DNs to something which isn't SCR-friendly, without causing any incompatibility in existing clients, etc. Hope the above explains the motivation for the feature. Thanks -Todd On Tue, Feb 26, 2013 at 1:47 PM, Bikas Saha <[EMAIL PROTECTED]> wrote: > Hi, > > In my opinion, this feature of short circuit reads (HDFS-347 or HDFS-2246) > is not a desirable feature for HDFS. We should be working towards removing > this feature instead of enhancing it and making it popular. > > Maybe short-circuit reads were something that HBase needed for performance > at a point in time when HDFS performance was slow. But after all the > improvements that have been made, is it still unacceptably slow to read > from HDFS? Is there more good engineering that we can do to close that > gap? Close it for all HDFS users and not just the ones who use > short-circuit reads? > Which brings me to the question - Who is the target audience for this > feature? From what I see, anyone who potentially wants to use it => everyone. Now if everyone starts using short circuit reads what happens to > the performance problem that we are trying to solve? Will performance Todd Lipcon Software Engineer, Cloudera
-
Re: VOTE: HDFS-347 mergeChris Douglas 2013-02-27, 00:52
Eli, you've sent the same email a half dozen times in the last ~24
hours. You might try a different tactic. Suresh, if you're willing to "support and maintain" HDFS-2246, do you have cycles to propose a patch to the HDFS-347 branch reintegrating HDFS-2246 with the simplifications you outlined? In your review, did you find anything else you'd like to address prior to the merge, or is this the only item? -C On Tue, Feb 26, 2013 at 1:51 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > On Tue, Feb 26, 2013 at 11:35 AM, Suresh Srinivas > <[EMAIL PROTECTED]> wrote: >>> >>> >>> I assume you mean in trunk? Given that ATM's proposal is to only >>> remove HDFS-2246 from branch-2 once (a) we're confident in HDFS-347 >>> and (b) adds Windows support, and we won't be releasing from trunk any >>> time soon - from a user perspective - HDFS-2246 will only be replaced >>> with HDFS-347 until it supports Windows. Ie ATM's compromise appears >>> to satisfy your requirement. >>> >> >> This means, HDFS-347 becomes available in branch-2 only when the >> corresponding windows work completes. Given that there is interest >> in HDFS-347 (see Andrew's email), not removing HDFS-2246 will make >> it available in an Apache release sooner. >> >> I am not sure why removing HDFS-2246 is so important in HDFS-347? Is it >> because you need to spend bunch of time to rework the code to add it back? > > Yes, this would require updating all the code to make the fallback > paths to work properly and re-testing. Given that all the patches > posted to 347 have removed 2246 for a while now (this is not a new > approach, Colin has been giving heads up months now about the patches > and the merge), and 2246 was supposed to be a short-term fix (2246 was > only supported on the condition that 347 was a wholesale replacement), > it doesn't seem right to hold up 347 up for Windows support given that > Windows support has not been merged to trunk yet, is not in any Apache > release, etc. Personally I don't like establishing the precedent here > that we can hold up a merge due to requirements from an unmerged > branch. > > We'd all like to see 347 show up in branch-2 soon, and the original > proposal to merge 347 accomplishes that, it's one of the downsides of > the compromise for Windows. > > Thanks, > Eli On Tue, Feb 26, 2013 at 1:51 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > On Tue, Feb 26, 2013 at 11:35 AM, Suresh Srinivas > <[EMAIL PROTECTED]> wrote: >>> >>> >>> I assume you mean in trunk? Given that ATM's proposal is to only >>> remove HDFS-2246 from branch-2 once (a) we're confident in HDFS-347 >>> and (b) adds Windows support, and we won't be releasing from trunk any >>> time soon - from a user perspective - HDFS-2246 will only be replaced >>> with HDFS-347 until it supports Windows. Ie ATM's compromise appears >>> to satisfy your requirement. >>> >> >> This means, HDFS-347 becomes available in branch-2 only when the >> corresponding windows work completes. Given that there is interest >> in HDFS-347 (see Andrew's email), not removing HDFS-2246 will make >> it available in an Apache release sooner. >> >> I am not sure why removing HDFS-2246 is so important in HDFS-347? Is it >> because you need to spend bunch of time to rework the code to add it back? > > Yes, this would require updating all the code to make the fallback > paths to work properly and re-testing. Given that all the patches > posted to 347 have removed 2246 for a while now (this is not a new > approach, Colin has been giving heads up months now about the patches > and the merge), and 2246 was supposed to be a short-term fix (2246 was > only supported on the condition that 347 was a wholesale replacement), > it doesn't seem right to hold up 347 up for Windows support given that > Windows support has not been merged to trunk yet, is not in any Apache > release, etc. Personally I don't like establishing the precedent here > that we can hold up a merge due to requirements from an unmerged > branch.
-
Re: VOTE: HDFS-347 mergeSuresh Srinivas 2013-02-27, 01:09
>
> Suresh, if you're willing to "support and maintain" HDFS-2246, do you > have cycles to propose a patch to the HDFS-347 branch reintegrating > HDFS-2246 with the simplifications you outlined? In your review, did > you find anything else you'd like to address prior to the merge, or is > this the only item? -C Yes, I can work on adding HDFS-2246 back in HDFS-347 branch. I will get to it in a week or two, if that is okay. I have not looked at the patch closely. But I think any issues found could be fixed in trunk and should not block the merge.
-
Re: VOTE: HDFS-347 mergesanjay Radia 2013-02-27, 01:36
On Feb 20, 2013, at 5:12 PM, Aaron T. Myers wrote: > On Wed, Feb 20, 2013 at 4:29 PM, Chris Douglas <[EMAIL PROTECTED]> wrote: > >> Given that HDFS-347 is a strictly better approach, once committed, >> there will be ample motivation to add support for other OSes and >> remove HDFS-2246 entirely. Nobody is confused about this. There's >> ample precedent for retaining obscure, clumsy features as a temporary >> stop-gap (e.g., service plugins, opaque blobs of bytes in Tasks, >> configurable combiner semantics). What's the virtue of insisting on >> removing this? Unless there was a lot of follow-on work, HDFS-2246 >> doesn't look like a lot of code... >> > > Chris's comment on keeping the removal of HDFS-2246 independent of HDFS-347 makes sense in this specific case and also in the general case where new optimizations are added. One objection is code complexity of fallback code: > Though it's not a ton of code, I think that having to support a more > complex fallback path (i.e. try the HDFS-347 method, then fall back to > trying the HDFS-2246 method, then fall back to doing normal TCP reads to > the local DN) will make the code quite a bit hairier for little added > benefit. Aaron, what if you didn't fall back in the fashion you suggest, but instead the code does one HDFS-2246 OR HDFS-347 when that option is set and HDFS-347 when both are set. Hence the fallback code will be of the same complexity. Clearly one could add a platform specific optimization down the road; one could think of HDFS-2246 as being a windows optimization. Another objection is who will maintain the HDFS-2246 code: Suresh has volunteered to maintain the code. What are the remaining objections leaving both styles of optimizations in the code base? sanjay
-
Re: VOTE: HDFS-347 mergesanjay Radia 2013-02-27, 19:45
On Feb 26, 2013, at 1:51 PM, Eli Collins wrote: > it doesn't seem right to hold up 347 up for Windows support given that > Windows support has not been merged to trunk yet, is not in any Apache > release, etc. Personally I don't like establishing the precedent here > that we can hold up a merge due to requirements from an unmerged > branch. It is not being held back of for the windows port. It is being held back because 2246 should not be removed as part of 347; a separate jira should had been filed to remove it. Chris argues well in an earlier email to remove 2246 on its own time and that there is precedent for doing so: Snippet of his email below: On Feb 20, 2013, at 4:29 PM, Chris Douglas wrote: > There's > ample precedent for retaining obscure, clumsy features as a temporary > stop-gap (e.g., service plugins, opaque blobs of bytes in Tasks, > configurable combiner semantics). What's the virtue of insisting on > removing this? sanjay
-
Re: VOTE: HDFS-347 mergeEli Collins 2013-02-27, 20:06
On Wed, Feb 27, 2013 at 11:45 AM, sanjay Radia <[EMAIL PROTECTED]> wrote:
> > On Feb 26, 2013, at 1:51 PM, Eli Collins wrote: > >> it doesn't seem right to hold up 347 up for Windows support given that >> Windows support has not been merged to trunk yet, is not in any Apache >> release, etc. Personally I don't like establishing the precedent here >> that we can hold up a merge due to requirements from an unmerged >> branch. > > It is not being held back of for the windows port. It is being held back because 2246 should not be removed as part of 347; a separate jira should had been filed to remove it. This isn't about just having a separate jira though right? We could easily pull the change out to two jiras (one removes 2246 and then next adds 347), they weren't separated because the goal for 347 was to be a re-write of the same feature (direct reads). You commented on 2246 that it is a temporary workaround for 347, do you no longer feel that way? Your reply to ATM made it seem like this was something that we'd be maintaining for a while (vs being a stopgap until 347 adds Windows support).
-
Re: VOTE: HDFS-347 mergeColin McCabe 2013-02-27, 23:28
Here is a compromise proposal, which hopefully will satisfy both sides:
We keep the old block reader and have a configuration option that enables it. So in addition to dfs.client.use.legacy.blockreader, which we already have, we would have dfs.client.use.legacy.blockreader.local. Does that make sense? best, Colin On Wed, Feb 27, 2013 at 12:06 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > On Wed, Feb 27, 2013 at 11:45 AM, sanjay Radia <[EMAIL PROTECTED]> wrote: >> >> On Feb 26, 2013, at 1:51 PM, Eli Collins wrote: >> >>> it doesn't seem right to hold up 347 up for Windows support given that >>> Windows support has not been merged to trunk yet, is not in any Apache >>> release, etc. Personally I don't like establishing the precedent here >>> that we can hold up a merge due to requirements from an unmerged >>> branch. >> >> It is not being held back of for the windows port. It is being held back because 2246 should not be removed as part of 347; a separate jira should had been filed to remove it. > > This isn't about just having a separate jira though right? We could > easily pull the change out to two jiras (one removes 2246 and then > next adds 347), they weren't separated because the goal for 347 was to > be a re-write of the same feature (direct reads). You commented on > 2246 that it is a temporary workaround for 347, do you no longer feel > that way? Your reply to ATM made it seem like this was something that > we'd be maintaining for a while (vs being a stopgap until 347 adds > Windows support).
-
Re: VOTE: HDFS-347 mergeChris Douglas 2013-02-27, 23:29
Suresh offered to write a patch restoring HDFS-2246, so unless his
timeline is unacceptable, I think we're done. On Wed, Feb 27, 2013 at 11:45 AM, sanjay Radia <[EMAIL PROTECTED]> wrote: > It is not being held back of for the windows port. It is being held back because 2246 should not be removed as part of 347; a separate jira should had been filed to remove it. > Chris argues well in an earlier email to remove 2246 on its own time and that there is precedent for doing so: Snippet of his email below: I must have been unclear. HDFS-347 supports a subset of the current users of HDFS-2246, so some devs have asked for time to accommodate them. There's ample precedent for *that*, even when the implementation of the obsolete feature is flawed. However, that accommodation was rarely indefinite, and usually scoped to a release or two. As one of the examples: combiners incompatible with Pig's use required a config knob in a patch version; it was removed in the subsequent release. The details matter, so no policy is possible, but parties may consider the pressure applied by removing HDFS-2246 in trunk as sufficient and appropriate, even if HDFS-2246 lives on in the 2.x branch. FWIW, I think that's the compromise solution. -C > On Feb 20, 2013, at 4:29 PM, Chris Douglas wrote: > >> There's >> ample precedent for retaining obscure, clumsy features as a temporary >> stop-gap (e.g., service plugins, opaque blobs of bytes in Tasks, >> configurable combiner semantics). What's the virtue of insisting on >> removing this? > > > sanjay
-
Re: VOTE: HDFS-347 mergeEli Collins 2013-02-27, 23:42
+1
It sounds like restoring 2246 to the 347 patch is the only path forward (ie there will be zero compromise on a proposal that removes 2246 in any form) in which case this seems like a good way to implement that (similar to what Sanjay suggested earlier). We can have a separate jira for removing 2246 and/or adding Windows support to 347, and move the discussion about how long 2246 should last, and in what branches there. On Wed, Feb 27, 2013 at 3:28 PM, Colin McCabe <[EMAIL PROTECTED]> wrote: > Here is a compromise proposal, which hopefully will satisfy both sides: > We keep the old block reader and have a configuration option that enables it. > > So in addition to dfs.client.use.legacy.blockreader, which we already > have, we would have dfs.client.use.legacy.blockreader.local. > > Does that make sense? > > best, > Colin > > > On Wed, Feb 27, 2013 at 12:06 PM, Eli Collins <[EMAIL PROTECTED]> wrote: >> On Wed, Feb 27, 2013 at 11:45 AM, sanjay Radia <[EMAIL PROTECTED]> wrote: >>> >>> On Feb 26, 2013, at 1:51 PM, Eli Collins wrote: >>> >>>> it doesn't seem right to hold up 347 up for Windows support given that >>>> Windows support has not been merged to trunk yet, is not in any Apache >>>> release, etc. Personally I don't like establishing the precedent here >>>> that we can hold up a merge due to requirements from an unmerged >>>> branch. >>> >>> It is not being held back of for the windows port. It is being held back because 2246 should not be removed as part of 347; a separate jira should had been filed to remove it. >> >> This isn't about just having a separate jira though right? We could >> easily pull the change out to two jiras (one removes 2246 and then >> next adds 347), they weren't separated because the goal for 347 was to >> be a re-write of the same feature (direct reads). You commented on >> 2246 that it is a temporary workaround for 347, do you no longer feel >> that way? Your reply to ATM made it seem like this was something that >> we'd be maintaining for a while (vs being a stopgap until 347 adds >> Windows support).
-
Re: VOTE: HDFS-347 mergeColin McCabe 2013-03-05, 20:24
On Tue, Feb 26, 2013 at 5:09 PM, Suresh Srinivas <[EMAIL PROTECTED]> wrote:
>> >> Suresh, if you're willing to "support and maintain" HDFS-2246, do you >> have cycles to propose a patch to the HDFS-347 branch reintegrating >> HDFS-2246 with the simplifications you outlined? In your review, did >> you find anything else you'd like to address prior to the merge, or is >> this the only item? -C > > > Yes, I can work on adding HDFS-2246 back in HDFS-347 branch. I will get to > it > in a week or two, if that is okay. > > I have not looked at the patch closely. But I think any issues found could > be fixed in trunk and should not block the merge. Hi Suresh, HDFS-4538 adds the ability to use the old block reader in the HDFS-347 branch. Check it out. Colin
-
Re: VOTE: HDFS-347 mergeSuresh Srinivas 2013-03-05, 21:09
Thanks Colin. Will check it out as soon as I can.
On Tue, Mar 5, 2013 at 12:24 PM, Colin McCabe <[EMAIL PROTECTED]>wrote: > On Tue, Feb 26, 2013 at 5:09 PM, Suresh Srinivas <[EMAIL PROTECTED]> > wrote: > >> > >> Suresh, if you're willing to "support and maintain" HDFS-2246, do you > >> have cycles to propose a patch to the HDFS-347 branch reintegrating > >> HDFS-2246 with the simplifications you outlined? In your review, did > >> you find anything else you'd like to address prior to the merge, or is > >> this the only item? -C > > > > > > Yes, I can work on adding HDFS-2246 back in HDFS-347 branch. I will get > to > > it > > in a week or two, if that is okay. > > > > I have not looked at the patch closely. But I think any issues found > could > > be fixed in trunk and should not block the merge. > > Hi Suresh, > > HDFS-4538 adds the ability to use the old block reader in the HDFS-347 > branch. Check it out. > > Colin > -- http://hortonworks.com/download/
-
Re: VOTE: HDFS-347 mergeTsz Wo Sze 2013-03-05, 23:08
Hi Colin,
It is great to hear that you agree to keep HDFS-2246. Please as well address my comments posted on HDFS-347 and let me know once you have posted a new patch on HDFS-347. Thanks a lot! BTW, you should conclude the VOTE you started and use a separated thread for other discussion as Suresh mentioned earlier. I hope you could use our convention on voting. Otherwise, it is hard for others to follow. Tsz-Wo ________________________________ From: Suresh Srinivas <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Wednesday, March 6, 2013 5:09 AM Subject: Re: VOTE: HDFS-347 merge Thanks Colin. Will check it out as soon as I can. On Tue, Mar 5, 2013 at 12:24 PM, Colin McCabe <[EMAIL PROTECTED]>wrote: > On Tue, Feb 26, 2013 at 5:09 PM, Suresh Srinivas <[EMAIL PROTECTED]> > wrote: > >> > >> Suresh, if you're willing to "support and maintain" HDFS-2246, do you > >> have cycles to propose a patch to the HDFS-347 branch reintegrating > >> HDFS-2246 with the simplifications you outlined? In your review, did > >> you find anything else you'd like to address prior to the merge, or is > >> this the only item? -C > > > > > > Yes, I can work on adding HDFS-2246 back in HDFS-347 branch. I will get > to > > it > > in a week or two, if that is okay. > > > > I have not looked at the patch closely. But I think any issues found > could > > be fixed in trunk and should not block the merge. > > Hi Suresh, > > HDFS-4538 adds the ability to use the old block reader in the HDFS-347 > branch. Check it out. > > Colin > -- http://hortonworks.com/download/
-
VOTE: HDFS-347 mergeColin McCabe 2013-04-01, 23:32
Hi all,
I think it's time to merge the HDFS-347 branch back to trunk. It's been under review and testing for several months, and provides both a performance advantage, and the ability to use short-circuit local reads without compromising system security. Previously, we tried to merge this and the objection was brought up that we should keep the old, insecure short-circuit local reads around so that platforms for which secure SCR had not yet been implemented could use it (e.g. Windows). This has been addressed-- see HDFS-4538 for details. Suresh has also volunteered to maintain the insecure SCR code until secure SCR can be implemented for Windows. Please cast your vote by EOD Monday 4/8. best, Colin |