|
Eric Baldeschwieler
2011-06-17, 07:17
Ryan Rawson
2011-06-17, 07:36
Ted Dunning
2011-06-17, 08:13
Arun C Murthy
2011-06-17, 14:15
Todd Lipcon
2011-06-17, 18:21
Sanjay Radia
2011-06-20, 20:44
Suresh Srinivas
2011-06-24, 21:07
Mahadev Konar
2011-06-24, 22:05
Arun C Murthy
2011-06-25, 00:28
Todd Lipcon
2011-06-25, 05:38
Doug Meil
2011-06-17, 13:21
Brian Bockelman
2011-06-17, 14:30
Jay Booth
2011-06-17, 14:37
Arun C Murthy
2011-06-17, 14:42
Todd Lipcon
2011-06-17, 17:33
Allen Wittenauer
2011-06-17, 17:33
Eric Baldeschwieler
2011-06-18, 06:10
Allen Wittenauer
2011-06-17, 17:31
Allen Wittenauer
2011-06-17, 20:27
Rajiv Chittajallu
2011-06-18, 02:02
Allen Wittenauer
2011-06-18, 02:15
Arun C Murthy
2011-06-18, 04:42
Steve Loughran
2011-06-21, 11:27
|
-
Thinking about the next hadoop mainline releaseEric Baldeschwieler 2011-06-17, 07:17
Hi Folks,
I'd like to start a conversation on mainline planning and the next release of Apache Hadoop beyond 0.22. The Yahoo! Hadoop team has been working hard to complete several big Hadoop projects, including: - HDFS Federation [HDFS-1052] - Already merged into trunk - Next Generation Map-Reduce [MR-279] - Passing most tests now and discussing merging into trunk - The merging of our previous work on Hadoop with security into mainline [http://yhoo.it/i9Ww8W] - This is mostly done, but owen and others are doing a scrub to close out the remaining issues All of these projects are now reaching a place where we would like to combine them with the good work already in 0.22 and put out a new apache release, perhaps 0.23. We think the best way to accomplish that is to finish the merge in the next few weeks and then cut a release from trunk. Yahoo stands ready to help us (the Apache Hadoop Community) turn this new release into a stable release by running it through its 9 month test and burn in process. The result of that will be another stable release such as 0.18, 0.20 or 0.20.203 (hadoop with security). We have Yahoo!s support for this substantial investment because this new release will have a great combination of new features for small and very large sites alike: - New Write Pipeline - HBase support [also in 0.21 & 0.22] - Federation - Scale up to larger clusters and the ability to experiment with new namenode approaches - Next Gen MapReduce - Scaleup, performance improvements, ability to experiment with new processing frameworks I think this effort will produce a great new Apache Hadoop release for the community. I'm starting this thread to collect feedback and hopefully folks' endorsement for merging in MR-279 and putting together this new release. Feedback please? Thanks, E14 +
Eric Baldeschwieler 2011-06-17, 07:17
-
Re: Thinking about the next hadoop mainline releaseRyan Rawson 2011-06-17, 07:36
HDFS-918 and HDFS-347 are absolutely critical for random read
performance. The smarter sites are already running HDFS-347 (I guess they aren't running "Hadoop" then?), and soon they will be testing and running HDFS-918 as well. Opening 1 socket for every read just isn't really scalable. -ryan On Fri, Jun 17, 2011 at 12:17 AM, Eric Baldeschwieler <[EMAIL PROTECTED]> wrote: > Hi Folks, > > I'd like to start a conversation on mainline planning and the next release of Apache Hadoop beyond 0.22. > > The Yahoo! Hadoop team has been working hard to complete several big Hadoop projects, including: > > - HDFS Federation [HDFS-1052] > - Already merged into trunk > > - Next Generation Map-Reduce [MR-279] > - Passing most tests now and discussing merging into trunk > > - The merging of our previous work on Hadoop with security into mainline [http://yhoo.it/i9Ww8W] > - This is mostly done, but owen and others are doing a scrub to close out the remaining issues > > All of these projects are now reaching a place where we would like to combine them with the good work already in 0.22 and put out a new apache release, perhaps 0.23. We think the best way to accomplish that is to finish the merge in the next few weeks and then cut a release from trunk. > > Yahoo stands ready to help us (the Apache Hadoop Community) turn this new release into a stable release by running it through its 9 month test and burn in process. The result of that will be another stable release such as 0.18, 0.20 or 0.20.203 (hadoop with security). We have Yahoo!s support for this substantial investment because this new release will have a great combination of new features for small and very large sites alike: > - New Write Pipeline - HBase support [also in 0.21 & 0.22] > - Federation - Scale up to larger clusters and the ability to experiment with new namenode approaches > - Next Gen MapReduce - Scaleup, performance improvements, ability to experiment with new processing frameworks > > I think this effort will produce a great new Apache Hadoop release for the community. I'm starting this thread to collect feedback and hopefully folks' endorsement for merging in MR-279 and putting together this new release. Feedback please? > > Thanks, > > E14 > > +
Ryan Rawson 2011-06-17, 07:36
-
Re: Thinking about the next hadoop mainline releaseTed Dunning 2011-06-17, 08:13
NG map reduce is a huge deal both in terms of making things better for
users, but also in terms of unblocking the Hadoop development process. On Fri, Jun 17, 2011 at 9:36 AM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > > - Next Generation Map-Reduce [MR-279] > > - Passing most tests now and discussing merging into trunk > +
Ted Dunning 2011-06-17, 08:13
-
Re: Thinking about the next hadoop mainline releaseArun C Murthy 2011-06-17, 14:15
I volunteer to be the RM for the release since I've been leading the NG NR effort.
Are folks ok with this? thanks, Arun Sent from my iPhone On Jun 17, 2011, at 1:45 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > NG map reduce is a huge deal both in terms of making things better for > users, but also in terms of unblocking the Hadoop development process. > > On Fri, Jun 17, 2011 at 9:36 AM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > >>> - Next Generation Map-Reduce [MR-279] >>> - Passing most tests now and discussing merging into trunk >> +
Arun C Murthy 2011-06-17, 14:15
-
Re: Thinking about the next hadoop mainline releaseTodd Lipcon 2011-06-17, 18:21
On Fri, Jun 17, 2011 at 7:15 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> I volunteer to be the RM for the release since I've been leading the NG NR effort. > > Are folks ok with this? +1. It would be an honor to fix bugs for you, Arun. -Todd > Sent from my iPhone > > On Jun 17, 2011, at 1:45 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > >> NG map reduce is a huge deal both in terms of making things better for >> users, but also in terms of unblocking the Hadoop development process. >> >> On Fri, Jun 17, 2011 at 9:36 AM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >> >>>> - Next Generation Map-Reduce [MR-279] >>>> - Passing most tests now and discussing merging into trunk >>> > -- Todd Lipcon Software Engineer, Cloudera +
Todd Lipcon 2011-06-17, 18:21
-
Re: Thinking about the next hadoop mainline releaseSanjay Radia 2011-06-20, 20:44
On Jun 17, 2011, at 7:15 AM, Arun C Murthy wrote: > I volunteer to be the RM for the release since I've been leading the > NG NR effort. > > Are folks ok with this? +1 sanjay > > thanks, > Arun > > Sent from my iPhone > > On Jun 17, 2011, at 1:45 PM, "Ted Dunning" <[EMAIL PROTECTED]> > wrote: > >> NG map reduce is a huge deal both in terms of making things better >> for >> users, but also in terms of unblocking the Hadoop development >> process. >> >> On Fri, Jun 17, 2011 at 9:36 AM, Ryan Rawson <[EMAIL PROTECTED]> >> wrote: >> >>>> - Next Generation Map-Reduce [MR-279] >>>> - Passing most tests now and discussing merging into trunk >>> +
Sanjay Radia 2011-06-20, 20:44
-
Re: Thinking about the next hadoop mainline releaseSuresh Srinivas 2011-06-24, 21:07
+1. Arun, I can also help you with managing the release for HDFS.
On 6/17/11 7:15 AM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: > I volunteer to be the RM for the release since I've been leading the NG NR > effort. > > Are folks ok with this? > > thanks, > Arun > > Sent from my iPhone > > On Jun 17, 2011, at 1:45 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > >> NG map reduce is a huge deal both in terms of making things better for >> users, but also in terms of unblocking the Hadoop development process. >> >> On Fri, Jun 17, 2011 at 9:36 AM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >> >>>> - Next Generation Map-Reduce [MR-279] >>>> - Passing most tests now and discussing merging into trunk >>> +
Suresh Srinivas 2011-06-24, 21:07
-
Re: Thinking about the next hadoop mainline releaseMahadev Konar 2011-06-24, 22:05
+1. I'd be happy to help in any way possible.
thanks mahadev On Fri, Jun 24, 2011 at 2:07 PM, Suresh Srinivas <[EMAIL PROTECTED]> wrote: > +1. Arun, I can also help you with managing the release for HDFS. > > > On 6/17/11 7:15 AM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: > >> I volunteer to be the RM for the release since I've been leading the NG NR >> effort. >> >> Are folks ok with this? >> >> thanks, >> Arun >> >> Sent from my iPhone >> >> On Jun 17, 2011, at 1:45 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: >> >>> NG map reduce is a huge deal both in terms of making things better for >>> users, but also in terms of unblocking the Hadoop development process. >>> >>> On Fri, Jun 17, 2011 at 9:36 AM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >>> >>>>> - Next Generation Map-Reduce [MR-279] >>>>> - Passing most tests now and discussing merging into trunk >>>> > > -- thanks mahadev @mahadevkonar +
Mahadev Konar 2011-06-24, 22:05
-
Re: Thinking about the next hadoop mainline releaseArun C Murthy 2011-06-25, 00:28
Thanks Suresh!
Todd - I'd appreciate if you could help on some of the HBase/Performance jiras... thanks! Sent from my iPhone On Jun 24, 2011, at 2:09 PM, "Suresh Srinivas" <[EMAIL PROTECTED]> wrote: > +1. Arun, I can also help you with managing the release for HDFS. > > > On 6/17/11 7:15 AM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: > >> I volunteer to be the RM for the release since I've been leading the NG NR >> effort. >> >> Are folks ok with this? >> >> thanks, >> Arun >> >> Sent from my iPhone >> >> On Jun 17, 2011, at 1:45 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: >> >>> NG map reduce is a huge deal both in terms of making things better for >>> users, but also in terms of unblocking the Hadoop development process. >>> >>> On Fri, Jun 17, 2011 at 9:36 AM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >>> >>>>> - Next Generation Map-Reduce [MR-279] >>>>> - Passing most tests now and discussing merging into trunk >>>> > +
Arun C Murthy 2011-06-25, 00:28
-
Re: Thinking about the next hadoop mainline releaseTodd Lipcon 2011-06-25, 05:38
On Fri, Jun 24, 2011 at 5:28 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> Thanks Suresh! > > Todd - I'd appreciate if you could help on some of the HBase/Performance > jiras... thanks! > > Sure thing. -Todd -- Todd Lipcon Software Engineer, Cloudera +
Todd Lipcon 2011-06-25, 05:38
-
RE: Thinking about the next hadoop mainline releaseDoug Meil 2011-06-17, 13:21
+1 on what Ryan said.
-----Original Message----- From: Ryan Rawson [mailto:[EMAIL PROTECTED]] Sent: Friday, June 17, 2011 3:36 AM To: [EMAIL PROTECTED] Subject: Re: Thinking about the next hadoop mainline release HDFS-918 and HDFS-347 are absolutely critical for random read performance. The smarter sites are already running HDFS-347 (I guess they aren't running "Hadoop" then?), and soon they will be testing and running HDFS-918 as well. Opening 1 socket for every read just isn't really scalable. -ryan On Fri, Jun 17, 2011 at 12:17 AM, Eric Baldeschwieler <[EMAIL PROTECTED]> wrote: > Hi Folks, > > I'd like to start a conversation on mainline planning and the next release of Apache Hadoop beyond 0.22. > > The Yahoo! Hadoop team has been working hard to complete several big Hadoop projects, including: > > - HDFS Federation [HDFS-1052] > - Already merged into trunk > > - Next Generation Map-Reduce [MR-279] > - Passing most tests now and discussing merging into trunk > > - The merging of our previous work on Hadoop with security into > mainline [http://yhoo.it/i9Ww8W] > - This is mostly done, but owen and others are doing a scrub to close > out the remaining issues > > All of these projects are now reaching a place where we would like to combine them with the good work already in 0.22 and put out a new apache release, perhaps 0.23. We think the best way to accomplish that is to finish the merge in the next few weeks and then cut a release from trunk. > > Yahoo stands ready to help us (the Apache Hadoop Community) turn this new release into a stable release by running it through its 9 month test and burn in process. The result of that will be another stable release such as 0.18, 0.20 or 0.20.203 (hadoop with security). We have Yahoo!s support for this substantial investment because this new release will have a great combination of new features for small and very large sites alike: > - New Write Pipeline - HBase support [also in 0.21 & 0.22] > - Federation - Scale up to larger clusters and the ability to > experiment with new namenode approaches > - Next Gen MapReduce - Scaleup, performance improvements, ability to > experiment with new processing frameworks > > I think this effort will produce a great new Apache Hadoop release for the community. I'm starting this thread to collect feedback and hopefully folks' endorsement for merging in MR-279 and putting together this new release. Feedback please? > > Thanks, > > E14 > > +
Doug Meil 2011-06-17, 13:21
-
Re: Thinking about the next hadoop mainline releaseBrian Bockelman 2011-06-17, 14:30
Hi Ryan, Eric,
Just looked at those two for the first time in awhile. - HDFS-918 (now 1323?) doesn't seem like it's too controversial, but does seem like there's a bit of validation left. - HDFS-347 has a long, contentious history. However, it seems that most of the strong objections have been cleared up. Is there anyone left who objects to it, now that it doesn't appear to bypass security? Finally, I see Todd has posted HDFS-2080 claiming some sizable performance improvements. Would it be possible that could finish in time for release? As a site which heavily uses random reads and high-throughput reads, I'm very excited for this release! Brian On Jun 17, 2011, at 2:36 AM, Ryan Rawson wrote: > HDFS-918 and HDFS-347 are absolutely critical for random read > performance. The smarter sites are already running HDFS-347 (I guess > they aren't running "Hadoop" then?), and soon they will be testing and > running HDFS-918 as well. Opening 1 socket for every read just isn't > really scalable. > > -ryan > > On Fri, Jun 17, 2011 at 12:17 AM, Eric Baldeschwieler > <[EMAIL PROTECTED]> wrote: >> Hi Folks, >> >> I'd like to start a conversation on mainline planning and the next release of Apache Hadoop beyond 0.22. >> >> The Yahoo! Hadoop team has been working hard to complete several big Hadoop projects, including: >> >> - HDFS Federation [HDFS-1052] >> - Already merged into trunk >> >> - Next Generation Map-Reduce [MR-279] >> - Passing most tests now and discussing merging into trunk >> >> - The merging of our previous work on Hadoop with security into mainline [http://yhoo.it/i9Ww8W] >> - This is mostly done, but owen and others are doing a scrub to close out the remaining issues >> >> All of these projects are now reaching a place where we would like to combine them with the good work already in 0.22 and put out a new apache release, perhaps 0.23. We think the best way to accomplish that is to finish the merge in the next few weeks and then cut a release from trunk. >> >> Yahoo stands ready to help us (the Apache Hadoop Community) turn this new release into a stable release by running it through its 9 month test and burn in process. The result of that will be another stable release such as 0.18, 0.20 or 0.20.203 (hadoop with security). We have Yahoo!s support for this substantial investment because this new release will have a great combination of new features for small and very large sites alike: >> - New Write Pipeline - HBase support [also in 0.21 & 0.22] >> - Federation - Scale up to larger clusters and the ability to experiment with new namenode approaches >> - Next Gen MapReduce - Scaleup, performance improvements, ability to experiment with new processing frameworks >> >> I think this effort will produce a great new Apache Hadoop release for the community. I'm starting this thread to collect feedback and hopefully folks' endorsement for merging in MR-279 and putting together this new release. Feedback please? >> >> Thanks, >> >> E14 >> >> +
Brian Bockelman 2011-06-17, 14:30
-
Re: Thinking about the next hadoop mainline releaseJay Booth 2011-06-17, 14:37
I can look at 1323 (hdfs-918's successor) next week/weekend and clear
the test problems, thanks Todd for updating the patch to current trunk. 1323 is only filechannel-pooling, which is much less disruptive than refactoring everything in the DN to be event-driven. On Fri, Jun 17, 2011 at 10:30 AM, Brian Bockelman <[EMAIL PROTECTED]> wrote: > Hi Ryan, Eric, > > Just looked at those two for the first time in awhile. > - HDFS-918 (now 1323?) doesn't seem like it's too controversial, but does seem like there's a bit of validation left. > - HDFS-347 has a long, contentious history. However, it seems that most of the strong objections have been cleared up. Is there anyone left who objects to it, now that it doesn't appear to bypass security? > > Finally, I see Todd has posted HDFS-2080 claiming some sizable performance improvements. Would it be possible that could finish in time for release? > > As a site which heavily uses random reads and high-throughput reads, I'm very excited for this release! > > Brian > > On Jun 17, 2011, at 2:36 AM, Ryan Rawson wrote: > >> HDFS-918 and HDFS-347 are absolutely critical for random read >> performance. The smarter sites are already running HDFS-347 (I guess >> they aren't running "Hadoop" then?), and soon they will be testing and >> running HDFS-918 as well. Opening 1 socket for every read just isn't >> really scalable. >> >> -ryan >> >> On Fri, Jun 17, 2011 at 12:17 AM, Eric Baldeschwieler >> <[EMAIL PROTECTED]> wrote: >>> Hi Folks, >>> >>> I'd like to start a conversation on mainline planning and the next release of Apache Hadoop beyond 0.22. >>> >>> The Yahoo! Hadoop team has been working hard to complete several big Hadoop projects, including: >>> >>> - HDFS Federation [HDFS-1052] >>> - Already merged into trunk >>> >>> - Next Generation Map-Reduce [MR-279] >>> - Passing most tests now and discussing merging into trunk >>> >>> - The merging of our previous work on Hadoop with security into mainline [http://yhoo.it/i9Ww8W] >>> - This is mostly done, but owen and others are doing a scrub to close out the remaining issues >>> >>> All of these projects are now reaching a place where we would like to combine them with the good work already in 0.22 and put out a new apache release, perhaps 0.23. We think the best way to accomplish that is to finish the merge in the next few weeks and then cut a release from trunk. >>> >>> Yahoo stands ready to help us (the Apache Hadoop Community) turn this new release into a stable release by running it through its 9 month test and burn in process. The result of that will be another stable release such as 0.18, 0.20 or 0.20.203 (hadoop with security). We have Yahoo!s support for this substantial investment because this new release will have a great combination of new features for small and very large sites alike: >>> - New Write Pipeline - HBase support [also in 0.21 & 0.22] >>> - Federation - Scale up to larger clusters and the ability to experiment with new namenode approaches >>> - Next Gen MapReduce - Scaleup, performance improvements, ability to experiment with new processing frameworks >>> >>> I think this effort will produce a great new Apache Hadoop release for the community. I'm starting this thread to collect feedback and hopefully folks' endorsement for merging in MR-279 and putting together this new release. Feedback please? >>> >>> Thanks, >>> >>> E14 >>> >>> > > +
Jay Booth 2011-06-17, 14:37
-
Re: Thinking about the next hadoop mainline releaseArun C Murthy 2011-06-17, 14:42
Ryan & Brian,
All that needs to be included in a release branch after the branch is cut is that someone needs to convince the RM to include it. It should be fairly straight-forward. OTOH, if it's in trunk when the branch is made this discussion is moot. So, please provide necessary feedback to the RM when the branch is made and let's focus on a high-level goal for a next release off trunk in this thread. Makes sense? thanks, Arun Sent from my iPhone On Jun 17, 2011, at 8:01 PM, "Brian Bockelman" <[EMAIL PROTECTED]> wrote: > Hi Ryan, Eric, > > Just looked at those two for the first time in awhile. > - HDFS-918 (now 1323?) doesn't seem like it's too controversial, but does seem like there's a bit of validation left. > - HDFS-347 has a long, contentious history. However, it seems that most of the strong objections have been cleared up. Is there anyone left who objects to it, now that it doesn't appear to bypass security? > > Finally, I see Todd has posted HDFS-2080 claiming some sizable performance improvements. Would it be possible that could finish in time for release? > > As a site which heavily uses random reads and high-throughput reads, I'm very excited for this release! > > Brian > > On Jun 17, 2011, at 2:36 AM, Ryan Rawson wrote: > >> HDFS-918 and HDFS-347 are absolutely critical for random read >> performance. The smarter sites are already running HDFS-347 (I guess >> they aren't running "Hadoop" then?), and soon they will be testing and >> running HDFS-918 as well. Opening 1 socket for every read just isn't >> really scalable. >> >> -ryan >> >> On Fri, Jun 17, 2011 at 12:17 AM, Eric Baldeschwieler >> <[EMAIL PROTECTED]> wrote: >>> Hi Folks, >>> >>> I'd like to start a conversation on mainline planning and the next release of Apache Hadoop beyond 0.22. >>> >>> The Yahoo! Hadoop team has been working hard to complete several big Hadoop projects, including: >>> >>> - HDFS Federation [HDFS-1052] >>> - Already merged into trunk >>> >>> - Next Generation Map-Reduce [MR-279] >>> - Passing most tests now and discussing merging into trunk >>> >>> - The merging of our previous work on Hadoop with security into mainline [http://yhoo.it/i9Ww8W] >>> - This is mostly done, but owen and others are doing a scrub to close out the remaining issues >>> >>> All of these projects are now reaching a place where we would like to combine them with the good work already in 0.22 and put out a new apache release, perhaps 0.23. We think the best way to accomplish that is to finish the merge in the next few weeks and then cut a release from trunk. >>> >>> Yahoo stands ready to help us (the Apache Hadoop Community) turn this new release into a stable release by running it through its 9 month test and burn in process. The result of that will be another stable release such as 0.18, 0.20 or 0.20.203 (hadoop with security). We have Yahoo!s support for this substantial investment because this new release will have a great combination of new features for small and very large sites alike: >>> - New Write Pipeline - HBase support [also in 0.21 & 0.22] >>> - Federation - Scale up to larger clusters and the ability to experiment with new namenode approaches >>> - Next Gen MapReduce - Scaleup, performance improvements, ability to experiment with new processing frameworks >>> >>> I think this effort will produce a great new Apache Hadoop release for the community. I'm starting this thread to collect feedback and hopefully folks' endorsement for merging in MR-279 and putting together this new release. Feedback please? >>> >>> Thanks, >>> >>> E14 >>> >>> > +
Arun C Murthy 2011-06-17, 14:42
-
Re: Thinking about the next hadoop mainline releaseTodd Lipcon 2011-06-17, 17:33
On Fri, Jun 17, 2011 at 7:30 AM, Brian Bockelman <[EMAIL PROTECTED]> wrote:
> > Hi Ryan, Eric, > > Just looked at those two for the first time in awhile. > - HDFS-918 (now 1323?) doesn't seem like it's too controversial, but does seem like there's a bit of validation left. Yes, 1323 and also 1148 would be "nice to haves", but neither is ready to go, yet. Though I really want to improve HBase performance, I also tend to be fairly conservative on how much testing these things should need before getting checked in (unless they can be completely pluggable). The good news is we did get 941 in last week, and that's a real nice improvement. > - HDFS-347 has a long, contentious history. However, it seems that most of the strong objections have been cleared up. Is there anyone left who objects to it, now that it doesn't appear to bypass security? It still has a way to go to be pushed over the finish line. I don't foresee it happening for this release. > Finally, I see Todd has posted HDFS-2080 claiming some sizable performance improvements. Would it be possible that could finish in time for release? HDFS-2080 has very good bang-for-the-buck in the gains-per-complexity ratio, especially compared to 347. It could also be made completely pluggable, since it's just a new implementation of BlockReader. So it might be feasible to include but not enabled by default. But, I wouldn't block the 0.23 (or any other) release on including these things. If they're done and look low-risk at an early enough date, I'll do my best to convince the RM to include them, but if they haven't had enough testing, then off to the next release with em. -Todd > > On Jun 17, 2011, at 2:36 AM, Ryan Rawson wrote: > > > HDFS-918 and HDFS-347 are absolutely critical for random read > > performance. The smarter sites are already running HDFS-347 (I guess > > they aren't running "Hadoop" then?), and soon they will be testing and > > running HDFS-918 as well. Opening 1 socket for every read just isn't > > really scalable. > > > > -ryan > > > > On Fri, Jun 17, 2011 at 12:17 AM, Eric Baldeschwieler > > <[EMAIL PROTECTED]> wrote: > >> Hi Folks, > >> > >> I'd like to start a conversation on mainline planning and the next release of Apache Hadoop beyond 0.22. > >> > >> The Yahoo! Hadoop team has been working hard to complete several big Hadoop projects, including: > >> > >> - HDFS Federation [HDFS-1052] > >> - Already merged into trunk > >> > >> - Next Generation Map-Reduce [MR-279] > >> - Passing most tests now and discussing merging into trunk > >> > >> - The merging of our previous work on Hadoop with security into mainline [http://yhoo.it/i9Ww8W] > >> - This is mostly done, but owen and others are doing a scrub to close out the remaining issues > >> > >> All of these projects are now reaching a place where we would like to combine them with the good work already in 0.22 and put out a new apache release, perhaps 0.23. We think the best way to accomplish that is to finish the merge in the next few weeks and then cut a release from trunk. > >> > >> Yahoo stands ready to help us (the Apache Hadoop Community) turn this new release into a stable release by running it through its 9 month test and burn in process. The result of that will be another stable release such as 0.18, 0.20 or 0.20.203 (hadoop with security). We have Yahoo!s support for this substantial investment because this new release will have a great combination of new features for small and very large sites alike: > >> - New Write Pipeline - HBase support [also in 0.21 & 0.22] > >> - Federation - Scale up to larger clusters and the ability to experiment with new namenode approaches > >> - Next Gen MapReduce - Scaleup, performance improvements, ability to experiment with new processing frameworks > >> > >> I think this effort will produce a great new Apache Hadoop release for the community. I'm starting this thread to collect feedback and hopefully folks' endorsement for merging in MR-279 and putting together this new release. Feedback please? Todd Lipcon Software Engineer, Cloudera +
Todd Lipcon 2011-06-17, 17:33
-
Re: Thinking about the next hadoop mainline releaseAllen Wittenauer 2011-06-17, 17:33
On Jun 17, 2011, at 12:36 AM, Ryan Rawson wrote: > HDFS-918 and HDFS-347 are absolutely critical for random read > performance. The smarter sites are already running HDFS-347 (I guess > they aren't running "Hadoop" then?), and soon they will be testing and > running HDFS-918 as well. Opening 1 socket for every read just isn't > really scalable. Isn't "random read [on HDFS]" and "smarter sites" in the same breath an oxymoron? +
Allen Wittenauer 2011-06-17, 17:33
-
Re: Thinking about the next hadoop mainline releaseEric Baldeschwieler 2011-06-18, 06:10
Hey Allen,
I agree with you that we should avoid future regressions like this. I think the tradeoff that got security working well on one heavily used platform was the right one for hadoop-with-security, but I'll be sure to raise such issues for discussion in the future as soon as I become aware of them. I agree that we should think about how to generalize the security work for other platforms. That work is just waiting for someone to jump in.... On ganglia and the metrics framework, it would be great if you could put your head together with rajive and come up with a joint proposal on what docs / code is needed to fix the regression in a clean way. That sounds like something we should be thinking about fixing in the 20 line and all future releases. thanks, E14 On Jun 17, 2011, at 10:33 AM, Allen Wittenauer wrote: > > On Jun 17, 2011, at 12:36 AM, Ryan Rawson wrote: > >> HDFS-918 and HDFS-347 are absolutely critical for random read >> performance. The smarter sites are already running HDFS-347 (I guess >> they aren't running "Hadoop" then?), and soon they will be testing and >> running HDFS-918 as well. Opening 1 socket for every read just isn't >> really scalable. > > Isn't "random read [on HDFS]" and "smarter sites" in the same breath an oxymoron? > > +
Eric Baldeschwieler 2011-06-18, 06:10
-
Re: Thinking about the next hadoop mainline releaseAllen Wittenauer 2011-06-17, 17:31
On Jun 17, 2011, at 12:17 AM, Eric Baldeschwieler wrote: > Yahoo stands ready to help us (the Apache Hadoop Community) turn this new release into a stable release by running it through its 9 month test and burn in process. The result of that will be another stable release such as 0.18, 0.20 or 0.20.203 (hadoop with security). I'd consider 0.20.203 pseudo-stable. It has some significant regressions on non-Linux platforms due to libhadoop.so being a dumping ground for all compiled code. On OS X, hadoop actually lies about things now. Nine months of testing is useless for those folks if the outcome is another 0.20.203. > I think this effort will produce a great new Apache Hadoop release for the community. I'm starting this thread to collect feedback and hopefully folks' endorsement for merging in MR-279 and putting together this new release. Feedback please? For .23, we desperately need to have libhadoop tell the Java code what it supports rather than just assuming that all the functionality is present if the library loads. I pretty much consider this a blocking issue. +
Allen Wittenauer 2011-06-17, 17:31
-
Re: Thinking about the next hadoop mainline releaseAllen Wittenauer 2011-06-17, 20:27
On Jun 17, 2011, at 10:31 AM, Allen Wittenauer wrote: > > On Jun 17, 2011, at 12:17 AM, Eric Baldeschwieler wrote: >> Yahoo stands ready to help us (the Apache Hadoop Community) turn this new release into a stable release by running it through its 9 month test and burn in process. The result of that will be another stable release such as 0.18, 0.20 or 0.20.203 (hadoop with security). > > > I'd consider 0.20.203 pseudo-stable. Actually, I was just reminded about the complete disaster that is metrics. So while it may be pseudo-stable, it isn't actually usable for anyone but Yahoo!. +
Allen Wittenauer 2011-06-17, 20:27
-
Re: Thinking about the next hadoop mainline releaseRajiv Chittajallu 2011-06-18, 02:02
Allen Wittenauer wrote on 06/17/11 at 13:27:43 -0700:
> > Actually, I was just reminded about the complete disaster that is metrics. So while it may be pseudo-stable, it isn't actually usable for anyone but Yahoo!. Did you try to use the new metrics framework? Are you complaining cause there is no port of ganglia metrics module to metrics V2? Its pluggable and you can choose to disable it if you want to. All metrics and status information is available via jmx. Multiple sinks and filters, more importantly you can refresh metrics configs without restarting the process. -rajive +
Rajiv Chittajallu 2011-06-18, 02:02
-
Re: Thinking about the next hadoop mainline releaseAllen Wittenauer 2011-06-18, 02:15
On Jun 17, 2011, at 7:02 PM, Rajiv Chittajallu wrote: > Allen Wittenauer wrote on 06/17/11 at 13:27:43 -0700: >> >> Actually, I was just reminded about the complete disaster that is metrics. So while it may be pseudo-stable, it isn't actually usable for anyone but Yahoo!. > > Did you try to use the new metrics framework? Are you complaining > cause there is no port of ganglia metrics module to metrics V2? Like 80-90%+ of the Hadoop sites that gather metrics, yes, we use Ganglia. > Its pluggable and you can choose to disable it if you want to. > All metrics and status information is available via jmx. > Multiple sinks and filters, more importantly you can refresh > metrics configs without restarting the process. But first we need to switch to a new metrics infrastructure. This wouldn't have been so bad if the release notes actually had said "You know your current metrics system? Yeah, toss it because we just broke everything." I'm certainly not a fan of Ganglia, but this sort of unexpected breakage is very, very bad. +
Allen Wittenauer 2011-06-18, 02:15
-
Re: Thinking about the next hadoop mainline releaseArun C Murthy 2011-06-18, 04:42
Allen,
Can we pls focus on the next release here, on this thread? Please provide your valuable f/b to the RM abt stuff you'd like to see in the next release. thanks, Arun Sent from my iPhone On Jun 18, 2011, at 7:45 AM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: > > On Jun 17, 2011, at 7:02 PM, Rajiv Chittajallu wrote: > >> Allen Wittenauer wrote on 06/17/11 at 13:27:43 -0700: >>> >>> Actually, I was just reminded about the complete disaster that is metrics. So while it may be pseudo-stable, it isn't actually usable for anyone but Yahoo!. >> >> Did you try to use the new metrics framework? Are you complaining >> cause there is no port of ganglia metrics module to metrics V2? > > Like 80-90%+ of the Hadoop sites that gather metrics, yes, we use Ganglia. > >> Its pluggable and you can choose to disable it if you want to. >> All metrics and status information is available via jmx. >> Multiple sinks and filters, more importantly you can refresh >> metrics configs without restarting the process. > > But first we need to switch to a new metrics infrastructure. This wouldn't have been so bad if the release notes actually had said "You know your current metrics system? Yeah, toss it because we just broke everything." > > I'm certainly not a fan of Ganglia, but this sort of unexpected breakage is very, very bad. +
Arun C Murthy 2011-06-18, 04:42
-
Re: Thinking about the next hadoop mainline releaseSteve Loughran 2011-06-21, 11:27
On 17/06/2011 21:27, Allen Wittenauer wrote:
> > On Jun 17, 2011, at 10:31 AM, Allen Wittenauer wrote: > >> >> On Jun 17, 2011, at 12:17 AM, Eric Baldeschwieler wrote: >>> Yahoo stands ready to help us (the Apache Hadoop Community) turn this new release into a stable release by running it through its 9 month test and burn in process. The result of that will be another stable release such as 0.18, 0.20 or 0.20.203 (hadoop with security). >> >> >> I'd consider 0.20.203 pseudo-stable. > > Actually, I was just reminded about the complete disaster that is metrics. So while it may be pseudo-stable, it isn't actually usable for anyone but Yahoo!. ooh, I have work on that that I should catch up with and get in. I'd also like to put in the IBM JVM patch, not out of personal need, but because I don't like Oracle-JVM dependencies, and it'd be nice to get IBM's donation into the source tree. +
Steve Loughran 2011-06-21, 11:27
|