|
Arun C Murthy
2011-09-26, 21:07
Roman Shaposhnik
2011-09-26, 22:15
Arun C Murthy
2011-09-27, 06:20
Arun C Murthy
2011-09-27, 06:40
Doug Cutting
2011-09-27, 15:50
Roman Shaposhnik
2011-09-27, 19:24
Todd Lipcon
2011-09-27, 19:56
Arun C Murthy
2011-09-28, 20:55
Jeff Hammerbacher
2011-09-29, 18:00
Eric Baldeschwieler
2011-09-29, 19:35
Doug Cutting
2011-09-29, 21:38
Eric Baldeschwieler
2011-09-30, 05:27
Konstantin Shvachko
2011-09-30, 09:23
Steve Loughran
2011-09-30, 09:34
Steve Loughran
2011-09-30, 10:17
Milind.Bhandarkar@...
2011-09-30, 16:18
Andrew Purtell
2011-09-30, 16:34
Matt Foley
2011-09-30, 17:22
Doug Cutting
2011-09-30, 18:23
Arun C Murthy
2011-09-30, 18:29
Roman Shaposhnik
2011-09-30, 18:44
Todd Lipcon
2011-09-30, 20:13
Arun C Murthy
2011-09-30, 22:00
Konstantin Boudnik
2011-09-30, 23:44
Konstantin Shvachko
2011-10-02, 02:13
Konstantin Shvachko
2011-10-02, 02:13
Eric Baldeschwieler
2011-10-03, 18:45
Arun C Murthy
2011-10-17, 17:17
Ted Yu
2011-10-17, 20:27
郭顺旭
2011-10-18, 08:52
Steve Loughran
2011-10-18, 09:36
Steve Loughran
2011-10-18, 11:36
Todd Lipcon
2011-10-18, 23:40
Harsh J
2011-10-19, 01:56
Steve Loughran
2011-10-19, 09:35
|
-
Update on hadoop-0.23Arun C Murthy 2011-09-26, 21:07
Greetings,
I thought I'd drop a note to update folks on progress of hadoop-0.23. Things are have been very busy in hadoop-0.23 land. We continue to crank through the issues and get ready to ship. We are mostly pass the initial teething pains of moving our entire build infrastructure to Maven - many thanks to Alejandro, Tom, Giri & Eric Yang. HDFS is nearly there: # HDFS Federation and Client side mount tables have been tested with ~300 node clusters with security turned on. # HDFS upgrades have been tested from 0.20.2xx. # Functional tests for HDFS are complete. NextGen MapReduce (aka MRv2, aka YARN) is coming along great: # We are happy to report we've done extensive scale testing to confirm stability - Sort/GridMixv3 etc. at ~350nodes - Scale testing with simulated clusters of ~1500 nodes # Functional tests for all of MapReduce functionality # Pig (0.9 & 0.9.1) working with NextGen MapReduce # All above have been done with no regressions in security. We are about to finish performance certification for both HDFS & MapReduce in the next couple of weeks too, after which we start integration tests with HBase, Hive, Oozie etc. We have cranked through 75 bugs in September alone (http://s.apache.org/mr-sept) and have another 50-ish bugs to go... we have at least 4 different organizations contributing patches to MRv2 in Sept alone: Yahoo, Hortonworks, LinkedIn & Huawei. Given where we are I'm confident we can have a strong hadoop-0.23.0 release by late October. The current plan is to deploy to alpha clusters in November. Citius, Altius, Fortius! :) Thanks to everyone who contributed, look forward to continued help. Arun PS: I'll continue to provide a periodic updates as we get closer to a hadoop-0.23.0 release.
-
Re: Update on hadoop-0.23Roman Shaposhnik 2011-09-26, 22:15
Hi Arun!
Great news! Hopefuly you wouldn't mind answering some of the questions below... On Mon, Sep 26, 2011 at 2:07 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > NextGen MapReduce (aka MRv2, aka YARN) is coming along great: > # We are happy to report we've done extensive scale testing to confirm stability > - Sort/GridMixv3 etc. at ~350nodes > - Scale testing with simulated clusters of ~1500 nodes > # Functional tests for all of MapReduce functionality > # Pig (0.9 & 0.9.1) working with NextGen MapReduce Is there a *released* version of Pig that compiles cleanly against .23 snapshots? Same question for Hive. > We are about to finish performance certification for both HDFS & MapReduce in the next > couple of weeks too, after which we start integration tests with HBase, Hive, Oozie etc. I'm curious -- what are these integrations tests? Can I take a look at them? I would be really nice if we can levarage those via Bigtop infrastructure. Currently we have a certain # of integration tests in Bigtop that we're running against a fully deployed stack, but it would be quite nice to have extra coverage. > Given where we are I'm confident we can have a strong hadoop-0.23.0 release > by late October. The current plan is to deploy to alpha clusters in November. Citius, Altius, Fortius! :) Could you, please, elaborate on what will be part of that deployment? Which versions of Pig, Hive, HBase, Oozie and Mahout are you targeting? Thanks, Roman.
-
Re: Update on hadoop-0.23Arun C Murthy 2011-09-27, 06:20
Roman,
In general, we'll need to make changes upstream: # I believe someone got HBase working. # We made changes to Pig - rather we got help from the Pig team, particularly Daniel. So, we plan to work through the rest of the stack - Hive, Oozie etc. very soon and we'll depend on updated releases from the individual projects. Arun On Sep 26, 2011, at 3:15 PM, Roman Shaposhnik wrote: > Hi Arun! > > Great news! Hopefuly you wouldn't mind answering some of the questions below... > > On Mon, Sep 26, 2011 at 2:07 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> NextGen MapReduce (aka MRv2, aka YARN) is coming along great: >> # We are happy to report we've done extensive scale testing to confirm stability >> - Sort/GridMixv3 etc. at ~350nodes >> - Scale testing with simulated clusters of ~1500 nodes >> # Functional tests for all of MapReduce functionality >> # Pig (0.9 & 0.9.1) working with NextGen MapReduce > > Is there a *released* version of Pig that compiles cleanly against .23 > snapshots? > Same question for Hive. > >> We are about to finish performance certification for both HDFS & MapReduce in the next >> couple of weeks too, after which we start integration tests with HBase, Hive, Oozie etc. > > I'm curious -- what are these integrations tests? Can I take a look at > them? I would > be really nice if we can levarage those via Bigtop infrastructure. Currently we > have a certain # of integration tests in Bigtop that we're running > against a fully > deployed stack, but it would be quite nice to have extra coverage. > >> Given where we are I'm confident we can have a strong hadoop-0.23.0 release >> by late October. The current plan is to deploy to alpha clusters in November. Citius, Altius, Fortius! :) > > Could you, please, elaborate on what will be part of that deployment? > Which versions > of Pig, Hive, HBase, Oozie and Mahout are you targeting? > > Thanks, > Roman.
-
Re: Update on hadoop-0.23Arun C Murthy 2011-09-27, 06:40
On Sep 26, 2011, at 11:20 PM, Arun C Murthy wrote: > Roman, > > In general, we'll need to make changes upstream: > # I believe someone got HBase working. > # We made changes to Pig - rather we got help from the Pig team, particularly Daniel. > > So, we plan to work through the rest of the stack - Hive, Oozie etc. very soon and we'll depend on updated releases from the individual projects. > To clarify, the changes to Pig were mainly due to it's usage of the Context Objects apis which have had changes in hadoop-0.21/hadoop-0.22. Also, we expect some pieces of the stack to change if they rely on undocumented/hidden features in MR. We are absolutely committed to ensuring end-user MR applications have full compatibility - to this end we have, long since, marked the old apis as stable & supported i.e. un-deprecated them. Arun > Arun > > On Sep 26, 2011, at 3:15 PM, Roman Shaposhnik wrote: > >> Hi Arun! >> >> Great news! Hopefuly you wouldn't mind answering some of the questions below... >> >> On Mon, Sep 26, 2011 at 2:07 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >>> NextGen MapReduce (aka MRv2, aka YARN) is coming along great: >>> # We are happy to report we've done extensive scale testing to confirm stability >>> - Sort/GridMixv3 etc. at ~350nodes >>> - Scale testing with simulated clusters of ~1500 nodes >>> # Functional tests for all of MapReduce functionality >>> # Pig (0.9 & 0.9.1) working with NextGen MapReduce >> >> Is there a *released* version of Pig that compiles cleanly against .23 >> snapshots? >> Same question for Hive. >> >>> We are about to finish performance certification for both HDFS & MapReduce in the next >>> couple of weeks too, after which we start integration tests with HBase, Hive, Oozie etc. >> >> I'm curious -- what are these integrations tests? Can I take a look at >> them? I would >> be really nice if we can levarage those via Bigtop infrastructure. Currently we >> have a certain # of integration tests in Bigtop that we're running >> against a fully >> deployed stack, but it would be quite nice to have extra coverage. >> >>> Given where we are I'm confident we can have a strong hadoop-0.23.0 release >>> by late October. The current plan is to deploy to alpha clusters in November. Citius, Altius, Fortius! :) >> >> Could you, please, elaborate on what will be part of that deployment? >> Which versions >> of Pig, Hive, HBase, Oozie and Mahout are you targeting? >> >> Thanks, >> Roman. >
-
Re: Update on hadoop-0.23Doug Cutting 2011-09-27, 15:50
On 09/26/2011 02:07 PM, Arun C Murthy wrote:
> We are about to finish performance certification for both HDFS & > MapReduce in the next couple of weeks too, after which we start > integration tests with HBase, Hive, Oozie etc. Who's 'we' here? I haven't seen this happening on the list, so I assume here you mean you and others working privately? This is great to hear, though. Can you provide any more details? BTW, has anyone else been benchmarking the 0.23 branch yet? Can they talk about their experiences? It's great to see 0.23 take shape. Thanks for your efforts here, Arun. Doug
-
Re: Update on hadoop-0.23Roman Shaposhnik 2011-09-27, 19:24
Hi Arun!
Thanks for the quick reply! I'm sorry if I had too many questions in my original email, but I can't find an answer to my "integration tests" question. Could you, please, share a URL with us where I can find out more about them? On Mon, Sep 26, 2011 at 11:20 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > # We made changes to Pig - rather we got help from the Pig team, particularly Daniel. > > So, we plan to work through the rest of the stack - Hive, Oozie etc. very soon and we'll > depend on updated releases from the individual projects. Do we have any kinds of commitment from downstream projects as far as those updates are concerned? Are they targeting these changes as part of point (patch) release of an already released version (like Pig 0.9.X for example) or will it be part of a brand new major release? Thanks, Roman.
-
Re: Update on hadoop-0.23Todd Lipcon 2011-09-27, 19:56
Hi all,
Just an update from the HBase side: I've run some cluster tests on HDFS 0.23 (as of about a month ago) and it generally works well. Performance for some workloads is ~2x due to HDFS-941, and can be improved a bit more if I finish HDFS-2080 in time. I did not do extensive failure testing (to stress the new append/sync code) but I do plan to do that in the coming months. HBase trunk can compile against 0.23 by using -Dhadoop23 on the maven build. Currently some 15 or so tests are failing - the following HBase JIRA tracks those issues: https://issues.apache.org/jira/browse/HBASE-4254 (these may be indicative of HDFS side bugs) Any help there from the community would be appreciated! -Todd On Tue, Sep 27, 2011 at 12:24 PM, Roman Shaposhnik <[EMAIL PROTECTED]> wrote: > Hi Arun! > > Thanks for the quick reply! > > I'm sorry if I had too many questions in my original email, but I can't find > an answer to my "integration tests" question. Could you, please, share > a URL with us where I can find out more about them? > > On Mon, Sep 26, 2011 at 11:20 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> # We made changes to Pig - rather we got help from the Pig team, particularly Daniel. >> >> So, we plan to work through the rest of the stack - Hive, Oozie etc. very soon and we'll >> depend on updated releases from the individual projects. > > Do we have any kinds of commitment from downstream projects as far as those > updates are concerned? Are they targeting these changes as part of point (patch) > release of an already released version (like Pig 0.9.X for example) or > will it be > part of a brand new major release? > > Thanks, > Roman. > -- Todd Lipcon Software Engineer, Cloudera
-
Re: Update on hadoop-0.23Arun C Murthy 2011-09-28, 20:55
Roman,
On Sep 27, 2011, at 12:24 PM, Roman Shaposhnik wrote: > I'm sorry if I had too many questions in my original email, but I can't find > an answer to my "integration tests" question. Could you, please, share > a URL with us where I can find out more about them? As you know, me & my team at HW along with folks at Y do a lot of manual testing along with tests like GridMix/PigMix etc. The basic idea is to test all features of HDFS, MapReduce, Streaming, Pipes etc. Similarly for Pig, Hive, Oozie. I'm sure none of this is news to you. Similarly we test for performance for all aspects of MapReduce. > On Mon, Sep 26, 2011 at 11:20 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> # We made changes to Pig - rather we got help from the Pig team, particularly Daniel. >> >> So, we plan to work through the rest of the stack - Hive, Oozie etc. very soon and we'll >> depend on updated releases from the individual projects. > > Do we have any kinds of commitment from downstream projects as far as those > updates are concerned? The easiest way to get 'commitment' is to test and provide patches as necessary: # mapreduce-dev@ helped pig-dev@ to do it for Pig # Todd has done it for HBase, hdfs-dev@ will further help as necessary # mapreduce-dev@ will soon help out for Hive, Oozie ... etc. > Are they targeting these changes as part of point (patch) > release of an already released version (like Pig 0.9.X for example) or > will it be > part of a brand new major release? I don't know about specific releases for all projects: # Pig will work 0.9.1 or 0.9.2 and beyond. # HBase trunk works (refer to Todd's msg) - the actual release depends on HBase community for a release. Historically, we make a release of Hadoop Core and then work through related projects to make necessary changes - mostly minor, sometime more. Thus, having an early release is important. Arun
-
Re: Update on hadoop-0.23Jeff Hammerbacher 2011-09-29, 18:00
>
> As you know, me & my team at HW along with folks at Y do a lot of manual > testing along with tests like GridMix/PigMix etc. > > The basic idea is to test all features of HDFS, MapReduce, Streaming, Pipes > etc. Similarly for Pig, Hive, Oozie. I'm sure none of this is news to you. > > Similarly we test for performance for all aspects of MapReduce. Why not make the code for these tests available to the community via Bigtop?
-
Re: Update on hadoop-0.23Eric Baldeschwieler 2011-09-29, 19:35
Hi Jeff,
This sees like a great opportunity for you to add some value. I'd welcome that. It seems rude to me to beat up the folks who have been driving the majority of the work on a project to do more. In general I don't think its good open source etiquete to ask others to contribute their time to address your concerns. This is a community of volunteers after all. If you're wondering why I am asserting that arun and company have done the majority of the work on 23, check out the last graph on this post, or look at the commit logs. http://www.hortonworks.com/the-yahoo-effect/ If you are interested in doing some work, I'd suggest starting by wiring in the work on gridmix and pigmix that members of the our mapreduce and pig teams have contributed. That's several man-years worth of testing contributed to the community. These are the center pieces of our testing. Go wild. Follow your passion, E14 On Sep 29, 2011, at 11:00 AM, Jeff Hammerbacher wrote: >> >> As you know, me & my team at HW along with folks at Y do a lot of manual >> testing along with tests like GridMix/PigMix etc. >> >> The basic idea is to test all features of HDFS, MapReduce, Streaming, Pipes >> etc. Similarly for Pig, Hive, Oozie. I'm sure none of this is news to you. >> >> Similarly we test for performance for all aspects of MapReduce. > > > Why not make the code for these tests available to the community via > Bigtop?
-
Re: Update on hadoop-0.23Doug Cutting 2011-09-29, 21:38
On 09/29/2011 12:35 PM, Eric Baldeschwieler wrote:
> If you're wondering why I am asserting that arun and company have > done the majority of the work on 23, check out the last graph on this > post, or look at the commit logs. The ASF discourages the use of Java's @author tag in large part because it tends to mark code as the territory of particular contributors. We want ASF codebases to be the responsibility of an entire community. Claiming that one party has contributed more than all others together seems to me to be a similar claim of ownership and a demand for credit. Folks should contribute to the ASF because they want the contributions of others to join their contributions, not so they can gain credit. Also, I'd be concerned for the health of a project if one group was really doing nearly all of the contribution. > http://www.hortonworks.com/the-yahoo-effect/ Hmm. Lines of code are not proportional to effort. The stacked cumulative histogram makes slopes steeper for those who happen to be on top. And the codebase did not start from zero in 2006. Here are some other reports for 0.23 that make contribution look pretty diverse and healthy. http://s.apache.org/tI http://s.apache.org/9mp http://s.apache.org/6E2 Statistics are fun, aren't they! Doug
-
Re: Update on hadoop-0.23Eric Baldeschwieler 2011-09-30, 05:27
Hi Doug, Jeff, Roman
Let me rephrase my point. I'd like to request that folks take bigtop project discussions onto the bigtop lists and don't greet status reports on general@hadoop with insinuations that folks who are working really hard on this project should be contributing different things to another project or are somehow misbehaving by testing on their own infrastructure with their own users. Any kind of testing is a gift to the community and adds value. You are all welcome to contribute too. If you find issues, then file JIRAs and work on the appropriate project lists. I believe that observing these points of etiquette will help this project continue to prosper. I agree with you that the Hadoop project is healthy. I'll leave the stats discussion to folks who want to dig through the data. I'm happy to go through the details with you offline. Thanks, E14 On Sep 29, 2011, at 2:38 PM, Doug Cutting wrote: > On 09/29/2011 12:35 PM, Eric Baldeschwieler wrote: >> If you're wondering why I am asserting that arun and company have >> done the majority of the work on 23, check out the last graph on this >> post, or look at the commit logs. > > The ASF discourages the use of Java's @author tag in large part because > it tends to mark code as the territory of particular contributors. We > want ASF codebases to be the responsibility of an entire community. > > Claiming that one party has contributed more than all others together > seems to me to be a similar claim of ownership and a demand for credit. > Folks should contribute to the ASF because they want the contributions > of others to join their contributions, not so they can gain credit. > > Also, I'd be concerned for the health of a project if one group was > really doing nearly all of the contribution. > >> http://www.hortonworks.com/the-yahoo-effect/ > > Hmm. Lines of code are not proportional to effort. The stacked > cumulative histogram makes slopes steeper for those who happen to be on > top. And the codebase did not start from zero in 2006. > > Here are some other reports for 0.23 that make contribution look pretty > diverse and healthy. > > http://s.apache.org/tI > http://s.apache.org/9mp > http://s.apache.org/6E2 > > Statistics are fun, aren't they! > > Doug
-
Re: Update on hadoop-0.23Konstantin Shvachko 2011-09-30, 09:23
On Thu, Sep 29, 2011 at 10:27 PM, Eric Baldeschwieler
<[EMAIL PROTECTED]> wrote: > Hi Doug, Jeff, Roman > > I'd like to request that folks take bigtop project discussions onto > the bigtop lists and don't greet status reports on general@hadoop I am personally very interested in the results of testing of 0.22 with BigTop, or other tools, or without any tools. So I'd like to ask (rather than request) good people to continue posting your findings on the general@hadoop list. Eric, thank you for your continuous contributions to Apache Hadoop. I also think that general@hadoop is the right place to discuss inter-project issues like making HBase, Pig, Hive, working on Hadoop 0.22 and 0.23. Where else? Thanks, --Konstantin
-
Re: Update on hadoop-0.23Steve Loughran 2011-09-30, 09:34
On 29/09/2011 22:38, Doug Cutting wrote:
> The ASF discourages the use of Java's @author tag in large part because > it tends to mark code as the territory of particular contributors. We > want ASF codebases to be the responsibility of an entire community. There's another reason which is if you have your name next to some code, you get emails asking about it for the rest of your life. Anonymity offers deniability
-
Re: Update on hadoop-0.23Steve Loughran 2011-09-30, 10:17
On 30/09/2011 06:27, Eric Baldeschwieler wrote:
> Hi Doug, Jeff, Roman > > Let me rephrase my point. I'd like to request that folks take bigtop project discussions onto the bigtop lists and don't greet status reports on general@hadoop with insinuations that folks who are working really hard on this project should be contributing different things to another project or are somehow misbehaving by testing on their own infrastructure with their own users. Any kind of testing is a gift to the community and adds value. You are all welcome to contribute too. If you find issues, then file JIRAs and work on the appropriate project lists. I believe that observing these points of etiquette will help this project continue to prosper. Bigtop is an attempt to have a coherent test & release process, with full stack testing, release artifacts tested on a set of platforms, and a codebase that has matured out of cloudera. I don't care about origin, all I want is consistent releases of compatible artifacts -and the testing to back up the claims of compatibility. The artifacts should be those things people install -RPMs, debs- ideally the tests should start of small clusters, then scale up to production size before release. there are things happening in the hadoop core that mimic some of the features here -RPMs- but appear to be lacking the full stack functional testing which is a goal of bigtop. > > I agree with you that the Hadoop project is healthy. How do you define health in this context? 1. There is a 0.20.20x branch that is the one people use in production -the stable one. The API is behind the 0.21+ feature set, and so is less convenient to code against. It picks up features as well as fixes, which I find troublesome. You don't see new features going into RHEL5.x, Ubuntu LTS releases. Yes, I know users like those features, but it could be due to a slow release of new versions that they trust to work and preserve data. It's healthy, but the backport of features creates inertia. 2. there is the 0.23 branch that everyone -especially Arun- is working on, which is really promising, though some of the features (federation, YARN) are going to be fairly traumatic in rollout. That doesn't mean they are good, only that switching to them will have surprises. 3. There's 0.22 which is going to combine the API of 0.21 with the fixes of 0.20.20x *and* will be the last release of the MR1.0 engine. For that last reason, I think there's value in pushing it out, though it's going to take time, and there's a risk of it adding another branch to be maintained for an indeterminate period. 4. There are the third party "compatible" projects, CDH, MapR, EMC HD, Amazon Elastic MR, which are all declaring compatibility with 0.20.x; no stated plans when/how to move to 0.23+ I would say Hadoop is incredibly successful -it's generating lots of interest, is being used by big companies, it has almost singlehandedly revitalised server-side Java dev, it is the foundation for an OSS version of the MS Azure stack. But for that latter goal to be achieved -it's what I want- we need to move forward on releases where the entire stack is consistent, releases that people want to use. For that consistency, I'd like bigtop to be a subject people can talk about here, just as MRUnit, which will be needed now that 0.23+ removes the MiniMRCluster feature. -steve
-
Re: Update on hadoop-0.23Milind.Bhandarkar@... 2011-09-30, 16:18
3. There's 0.22 which is going to combine the API of 0.21 with the fixes >of 0.20.20x *and* will be the last release of the MR1.0 engine. For that >last reason, I think there's value in pushing it out, though it's going >to take time, and there's a risk of it adding another branch to be >maintained for an indeterminate period. +1 - milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any organization, past or present, the author might be affiliated with.)
-
Re: Update on hadoop-0.23Andrew Purtell 2011-09-30, 16:34
This time it seems easy to split the difference here.
- Sufficient interest in Bigtop so announcements and discussions can/should go to general@.* - There is no need to (and a request not to) inject exhortations to participate in Bigtop into random other topics on general@, such as status reports by another project or group. Simply create new threads to discuss Bigtop matters. * - Seems to me a community effort to qualify an integrated stack top to bottom is a good thing, but I question doing this for 0.22, which nobody is going to use much, or so I hear. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >________________________________ >From: Konstantin Shvachko <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Friday, September 30, 2011 2:23 AM >Subject: Re: Update on hadoop-0.23 > >On Thu, Sep 29, 2011 at 10:27 PM, Eric Baldeschwieler ><[EMAIL PROTECTED]> wrote: >> Hi Doug, Jeff, Roman >> >> I'd like to request that folks take bigtop project discussions onto >> the bigtop lists and don't greet status reports on general@hadoop > >I am personally very interested in the results of testing of 0.22 with >BigTop, or other tools, or without any tools. >So I'd like to ask (rather than request) good people to continue >posting your findings on the general@hadoop list. > >Eric, thank you for your continuous contributions to Apache Hadoop. > >I also think that general@hadoop is the right place to discuss >inter-project issues like making HBase, Pig, Hive, >working on Hadoop 0.22 and 0.23. Where else? > >Thanks, >--Konstantin > > >
-
Re: Update on hadoop-0.23Matt Foley 2011-09-30, 17:22
>> Sufficient interest in Bigtop so announcements and discussions can/should
go to general@.* Why wouldn't this work like other projects, and Bigtop discussions go to the Bigtop mailing lists? (Announcements, of course, do belong on general.) People interested in Bigtop discussions sign up for the Bigtop mailing lists. I'm going to go do that right now, since I am. :-) --Matt On Fri, Sep 30, 2011 at 9:34 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > This time it seems easy to split the difference here. > > - Sufficient interest in Bigtop so announcements and discussions can/should > go to general@.* > > - There is no need to (and a request not to) inject exhortations to > participate in Bigtop into random other topics on general@, such as status > reports by another project or group. Simply create new threads to discuss > Bigtop matters. > > * - Seems to me a community effort to qualify an integrated stack top to > bottom is a good thing, but I question doing this for 0.22, which nobody is > going to use much, or so I hear. > > Best regards, > > > - Andy > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > > >________________________________ > >From: Konstantin Shvachko <[EMAIL PROTECTED]> > >To: [EMAIL PROTECTED] > >Sent: Friday, September 30, 2011 2:23 AM > >Subject: Re: Update on hadoop-0.23 > > > >On Thu, Sep 29, 2011 at 10:27 PM, Eric Baldeschwieler > ><[EMAIL PROTECTED]> wrote: > >> Hi Doug, Jeff, Roman > >> > >> I'd like to request that folks take bigtop project discussions onto > >> the bigtop lists and don't greet status reports on general@hadoop > > > >I am personally very interested in the results of testing of 0.22 with > >BigTop, or other tools, or without any tools. > >So I'd like to ask (rather than request) good people to continue > >posting your findings on the general@hadoop list. > > > >Eric, thank you for your continuous contributions to Apache Hadoop. > > > >I also think that general@hadoop is the right place to discuss > >inter-project issues like making HBase, Pig, Hive, > >working on Hadoop 0.22 and 0.23. Where else? > > > >Thanks, > >--Konstantin > > > > > > >
-
Re: Update on hadoop-0.23Doug Cutting 2011-09-30, 18:23
On 09/30/2011 03:17 AM, Steve Loughran wrote:
> 4. There are the third party "compatible" projects, CDH, MapR, EMC HD, > Amazon Elastic MR, which are all declaring compatibility with 0.20.x; no > stated plans when/how to move to 0.23+ CDH4 will include 0.23 (hopefully without any patches). I expect to see an alpha release of CDH4 late this year and a production-ready release early next year. Doug
-
Re: Update on hadoop-0.23Arun C Murthy 2011-09-30, 18:29
On Sep 30, 2011, at 3:17 AM, Steve Loughran wrote: > > 3. There's 0.22 which is going to combine the API of 0.21 with the fixes of 0.20.20x *and* will be the last release of the MR1.0 engine. For that last reason, I think there's value in pushing it out, though it's going to take time, and there's a risk of it adding another branch to be maintained for an indeterminate period. I'm all for people working on what they are passionate about, so this isn't to say one shouldn't spend time on 0.22. But, for clarity's sake, as I've done multiple times on both the list and in person to Konstantin etc., I'll point out (again) that 0.22 will need multiple man-years of development to achieve parity with 0.20.2xx just in terms of bug-fixes and performance. Then there is security, multi-tenancy etc. which regress significantly vis-a-vis 0.20.2xx. Then there is scaling etc. 0.23 is already past all of these hurdles and very close to meeting, if not beating 0.20.2xx in performance. It already beats 0.20.2xx in lots of dimensions (improved shuffle with zero-copy etc.). So, unless folks plan to invest this gargantuan time, please do not say that 0.21 has fixes from 0.20.2xx. That's all I ask. Thus, 0.20.2xx may well be the last _viable_ release of MR1 engine. Arun
-
Re: Update on hadoop-0.23Roman Shaposhnik 2011-09-30, 18:44
Hi Arun!
On Fri, Sep 30, 2011 at 10:54 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > I'm all for people working on what they are passionate about, so this isn't to say one shouldn't spend time on 0.22. > > But, for clarity's sake, as I've done multiple times on both the list and in person to Konstantin etc., > I'll point out (again) that 0.22 will need multiple man-years of development to achieve parity with > 0.20.2xx just in terms of bug-fixes and performance. I apologize if my level of institutional knowledge of these things is lacking, but do you have any benchmarking results between 0.22 and 0.20.2xx? The reason I'm asking is twofold -- I really would like to see an objective numbers qualifying the viability of 0.22 from the performance stand point, but more importantly I would really like to include the benchmarking code into Bigtop. In terms of bugs -- same question. Is there any publicly available list of, at least, the critical ones that make 0.22 not viable from your point of view? Thanks, Roman.
-
Re: Update on hadoop-0.23Todd Lipcon 2011-09-30, 20:13
On Fri, Sep 30, 2011 at 11:44 AM, Roman Shaposhnik <[EMAIL PROTECTED]> wrote:
> I apologize if my level of institutional knowledge of these things is > lacking, but do you have any > benchmarking results between 0.22 and 0.20.2xx? The reason I'm asking > is twofold -- I really > would like to see an objective numbers qualifying the viability of > 0.22 from the performance stand point, > but more importantly I would really like to include the benchmarking > code into Bigtop. 0.22 currently suffers from MAPREDUCE-2266, which, last time I benchmarked it, caused a significant slowdown. iirc a terasort ran something like twice as slow on my test cluster due to this bug. 0.23/MR2 doesn't suffer from this bug. -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: Update on hadoop-0.23Arun C Murthy 2011-09-30, 22:00
On Sep 30, 2011, at 1:13 PM, Todd Lipcon wrote: > On Fri, Sep 30, 2011 at 11:44 AM, Roman Shaposhnik <[EMAIL PROTECTED]> wrote: >> I apologize if my level of institutional knowledge of these things is >> lacking, but do you have any >> benchmarking results between 0.22 and 0.20.2xx? The reason I'm asking >> is twofold -- I really >> would like to see an objective numbers qualifying the viability of >> 0.22 from the performance stand point, >> but more importantly I would really like to include the benchmarking >> code into Bigtop. > > 0.22 currently suffers from MAPREDUCE-2266, which, last time I > benchmarked it, caused a significant slowdown. iirc a terasort ran > something like twice as slow on my test cluster due to this bug. > 0.23/MR2 doesn't suffer from this bug. > I don't really know where to start. CHANGES.txt in branch-0.20-security has the full list. If I remember right, long ago (late 2009) we benchmarked .21 with gridmix and saw >30% prior to abandoning .21. Since then 0.20.2xx has had innumerable improvements to JobTracker, TaskTracker etc. etc. # JobTracker itself is almost thrice as fast as it used to be in 2009. # The scheduler is significantly better (>2x locality) and throughput. # TaskTracker has had innumerable fixes for dist.cache, task launch, shutdown (MR-2266 and lots of other similar fixes). # The MR runtime has fixes for latency on innumerable fronts. Other regressions: # Security # Support for multi-tenant clusters. # Tonnes of operability fixes (jobhistory, task logs i.e. MR-1100) for running MR clusters. The one redeeming aspect for .22 is the shuffle based on the work we did for winning Terasort/Petasort in 2009 but 0.23 has even more work there with zero-copy with netty (yaay! no more jetty! Thanks to @cdouglas). > In terms of bugs -- same question. Is there any publicly available > list of, at least, the critical > ones that make 0.22 not viable from your point of view? We marked a lot of them as blockers on .22 and they were discarded by the release master(s). branch-0.20-security/CHANGES.txt is the full list. I really can't spend time enumerating over 4000 commits and > 2000 (?) jiras to that branch at this point. In my opinion, as someone who has helped develop/run/support very large installs and done this for over 5 1/2 years, a major release with regression on features (security, multi-tenancy) and scalability, performance etc. is distinctly _unviable_. ---- Again, none of this is meant to say you should invest time on fixing them or releasing 0.22 as it stands - just, please, don't label it in a manner which helps build unreasonable expectations among users about it's viability & usability. thanks, Arun
-
Re: Update on hadoop-0.23Konstantin Boudnik 2011-09-30, 23:44
BTW, Roman I have a recollection of Hadoop performance suite for iTest (aka
BigTop now) which I have put together during the initial development phase of the framework. I don't see in BigTop's source tree - has this work ever been committed to open source along with the iTest? This way if any benchmarking tests for 0.22 (or 0.20.2xx) are getting open we can put them to the same container conditional on letting them see the light of day by their current copyright holder. With regards, Cos Disclaimer: apologies for seemingly off-topic discussion, but this is about Hadoop performance testing, so bring up BigTop in this frame of reference looks completely justifiable. On Fri, Sep 30, 2011 at 11:44AM, Roman Shaposhnik wrote: > Hi Arun! > > On Fri, Sep 30, 2011 at 10:54 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > > I'm all for people working on what they are passionate about, so this isn't to say one shouldn't spend time on 0.22. > > > > But, for clarity's sake, as I've done multiple times on both the list and in person to Konstantin etc., > > I'll point out (again) that 0.22 will need multiple man-years of development to achieve parity with > > 0.20.2xx just in terms of bug-fixes and performance. > > I apologize if my level of institutional knowledge of these things is > lacking, but do you have any > benchmarking results between 0.22 and 0.20.2xx? The reason I'm asking > is twofold -- I really > would like to see an objective numbers qualifying the viability of > 0.22 from the performance stand point, > but more importantly I would really like to include the benchmarking > code into Bigtop. > > In terms of bugs -- same question. Is there any publicly available > list of, at least, the critical > ones that make 0.22 not viable from your point of view? > > Thanks, > Roman.
-
Re: Update on hadoop-0.23Konstantin Shvachko 2011-10-02, 02:13
On Fri, Sep 30, 2011 at 9:34 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
> * - Seems to me a community effort to qualify an integrated stack top to bottom is a good thing, but I question doing this for 0.22, which nobody is going to use much, or so I hear. > Andrew, what I hear here in the East Bay is that my 500 users will. I also hear that if 0.22 was available people would use it now. Thanks, Konstajntin >>________________________________ >>From: Konstantin Shvachko <[EMAIL PROTECTED]> >>To: [EMAIL PROTECTED] >>Sent: Friday, September 30, 2011 2:23 AM >>Subject: Re: Update on hadoop-0.23 >> >>On Thu, Sep 29, 2011 at 10:27 PM, Eric Baldeschwieler >><[EMAIL PROTECTED]> wrote: >>> Hi Doug, Jeff, Roman >>> >>> I'd like to request that folks take bigtop project discussions onto >>> the bigtop lists and don't greet status reports on general@hadoop >> >>I am personally very interested in the results of testing of 0.22 with >>BigTop, or other tools, or without any tools. >>So I'd like to ask (rather than request) good people to continue >>posting your findings on the general@hadoop list. >> >>Eric, thank you for your continuous contributions to Apache Hadoop. >> >>I also think that general@hadoop is the right place to discuss >>inter-project issues like making HBase, Pig, Hive, >>working on Hadoop 0.22 and 0.23. Where else? >> >>Thanks, >>--Konstantin >> >> >>
-
Re: Update on hadoop-0.23Konstantin Shvachko 2011-10-02, 02:13
I am very glad that the development and testing of 0.23 is going so well.
I see a lot of commits and hundreds of changes going in literally every day. It is great to see the new technology building! On the criticism of the 0.22 release. Arun has a top-down view and I agree a lot of progress have been achieved with the framework. My bottom-up view is that you first need a reliable storage layer. If the file system looses blocks or worse messes up with the image and/or journals, the performance of the framework is your second problem. I have said that before. Based on my experience it take time to stabilize a file system. Anybody seen one that has been stabilized in less than 2 years? I do not see the 0.22 release as a wasted effort. And if the progress with it contributes to the 0.23 rush I am twice as happy. Thanks, --Konstantin On Fri, Sep 30, 2011 at 3:00 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > > On Sep 30, 2011, at 1:13 PM, Todd Lipcon wrote: > >> On Fri, Sep 30, 2011 at 11:44 AM, Roman Shaposhnik <[EMAIL PROTECTED]> wrote: >>> I apologize if my level of institutional knowledge of these things is >>> lacking, but do you have any >>> benchmarking results between 0.22 and 0.20.2xx? The reason I'm asking >>> is twofold -- I really >>> would like to see an objective numbers qualifying the viability of >>> 0.22 from the performance stand point, >>> but more importantly I would really like to include the benchmarking >>> code into Bigtop. >> >> 0.22 currently suffers from MAPREDUCE-2266, which, last time I >> benchmarked it, caused a significant slowdown. iirc a terasort ran >> something like twice as slow on my test cluster due to this bug. >> 0.23/MR2 doesn't suffer from this bug. >> > > I don't really know where to start. CHANGES.txt in branch-0.20-security has the full list. > > If I remember right, long ago (late 2009) we benchmarked .21 with gridmix and saw >30% prior to abandoning .21. > > Since then 0.20.2xx has had innumerable improvements to JobTracker, TaskTracker etc. etc. > # JobTracker itself is almost thrice as fast as it used to be in 2009. > # The scheduler is significantly better (>2x locality) and throughput. > # TaskTracker has had innumerable fixes for dist.cache, task launch, shutdown (MR-2266 and lots of other similar fixes). > # The MR runtime has fixes for latency on innumerable fronts. > > Other regressions: > # Security > # Support for multi-tenant clusters. > # Tonnes of operability fixes (jobhistory, task logs i.e. MR-1100) for running MR clusters. > > The one redeeming aspect for .22 is the shuffle based on the work we did for winning Terasort/Petasort in 2009 but 0.23 has even more work there with zero-copy with netty (yaay! no more jetty! Thanks to @cdouglas). > >> In terms of bugs -- same question. Is there any publicly available >> list of, at least, the critical >> ones that make 0.22 not viable from your point of view? > > We marked a lot of them as blockers on .22 and they were discarded by the release master(s). branch-0.20-security/CHANGES.txt is the full list. I really can't spend time enumerating over 4000 commits and > 2000 (?) jiras to that branch at this point. > > In my opinion, as someone who has helped develop/run/support very large installs and done this for over 5 1/2 years, a major release with regression on features (security, multi-tenancy) and scalability, performance etc. is distinctly _unviable_. > > ---- > > Again, none of this is meant to say you should invest time on fixing them or releasing 0.22 as it stands - just, please, don't label it in a manner which helps build unreasonable expectations among users about it's viability & usability. > > thanks, > Arun > >
-
Re: Update on hadoop-0.23Eric Baldeschwieler 2011-10-03, 18:45
Thanks Andy,
I think this is a clear summary of what would be a good outcome. I would suggest that detailed bigtop discussions should go to bigtop, but status updates are undoubtedly interesting to this audience. But I do request that folks not "inject exhortations to participate in Bigtop into random other topics on general@, such as status reports by another project or group". E14 On Sep 30, 2011, at 9:34 AM, Andrew Purtell wrote: > This time it seems easy to split the difference here. > > - Sufficient interest in Bigtop so announcements and discussions can/should go to general@.* > > - There is no need to (and a request not to) inject exhortations to participate in Bigtop into random other topics on general@, such as status reports by another project or group. Simply create new threads to discuss Bigtop matters. > > * - Seems to me a community effort to qualify an integrated stack top to bottom is a good thing, but I question doing this for 0.22, which nobody is going to use much, or so I hear. > > Best regards, > > > - Andy > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > > >> ________________________________ >> From: Konstantin Shvachko <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Friday, September 30, 2011 2:23 AM >> Subject: Re: Update on hadoop-0.23 >> >> On Thu, Sep 29, 2011 at 10:27 PM, Eric Baldeschwieler >> <[EMAIL PROTECTED]> wrote: >>> Hi Doug, Jeff, Roman >>> >>> I'd like to request that folks take bigtop project discussions onto >>> the bigtop lists and don't greet status reports on general@hadoop >> >> I am personally very interested in the results of testing of 0.22 with >> BigTop, or other tools, or without any tools. >> So I'd like to ask (rather than request) good people to continue >> posting your findings on the general@hadoop list. >> >> Eric, thank you for your continuous contributions to Apache Hadoop. >> >> I also think that general@hadoop is the right place to discuss >> inter-project issues like making HBase, Pig, Hive, >> working on Hadoop 0.22 and 0.23. Where else? >> >> Thanks, >> --Konstantin >> >>
-
Update on hadoop-0.23Arun C Murthy 2011-10-17, 17:17
Folks,
Quick note - the dev community continues to scramble to get things wrapped up on hadoop-0.23. We are down to ~30 blockers and I hope to see them resolved over the next two weeks! Also, I feel Alejandro and Tom can finish up the remaining mavenization bits by then too - as I see it, it's very close... thanks guys! Once done, I plan to call a vote on a hadoop-0.23.0 which we can start deploying (and further stabilizing) right-away. My hope is that hadoop-0.23.0 is a strong alpha which we can then beat into shape after, the idea is to ship soon so we get folks to play with it and help downstream projects to integrate for e.g. Pig already works, and I know Todd is working on getting HBase to play well too. thanks, Arun
-
Re: Update on hadoop-0.23Ted Yu 2011-10-17, 20:27
On behalf of Harsh w.r.t. HBASE-4510 HDFS-1620 related changes downstream
(For compiling with HDFS 0.23+)<https://issues.apache.org/jira/browse/HBASE-4510> I need to open up some HDFS jiras and change the way HBase uses safemode determinism in the meanwhile. Cheers On Mon, Oct 17, 2011 at 10:17 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Folks, > > Quick note - the dev community continues to scramble to get things wrapped > up on hadoop-0.23. > > We are down to ~30 blockers and I hope to see them resolved over the next > two weeks! > > Also, I feel Alejandro and Tom can finish up the remaining mavenization > bits by then too - as I see it, it's very close... thanks guys! > > Once done, I plan to call a vote on a hadoop-0.23.0 which we can start > deploying (and further stabilizing) right-away. > > My hope is that hadoop-0.23.0 is a strong alpha which we can then beat > into shape after, the idea is to ship soon so we get folks to play with it > and help downstream projects to integrate for e.g. Pig already works, and I > know Todd is working on getting HBase to play well too. > > thanks, > Arun > >
-
Re:Update on hadoop-0.23郭顺旭 2011-10-18, 08:52
Great news!
At 2011-09-27 05:07:06,"Arun C Murthy" <[EMAIL PROTECTED]> wrote: >Greetings, > >I thought I'd drop a note to update folks on progress of hadoop-0.23. > >Things are have been very busy in hadoop-0.23 land. We continue to crank through the issues and get ready to ship. > >We are mostly pass the initial teething pains of moving our entire build infrastructure to Maven - many thanks to Alejandro, Tom, Giri & Eric Yang. > >HDFS is nearly there: ># HDFS Federation and Client side mount tables have been tested with ~300 node clusters with security turned on. ># HDFS upgrades have been tested from 0.20.2xx. ># Functional tests for HDFS are complete. > >NextGen MapReduce (aka MRv2, aka YARN) is coming along great: ># We are happy to report we've done extensive scale testing to confirm stability > - Sort/GridMixv3 etc. at ~350nodes > - Scale testing with simulated clusters of ~1500 nodes ># Functional tests for all of MapReduce functionality ># Pig (0.9 & 0.9.1) working with NextGen MapReduce ># All above have been done with no regressions in security. > >We are about to finish performance certification for both HDFS & MapReduce in the next couple of weeks too, after which we start integration tests with HBase, Hive, Oozie etc. > >We have cranked through 75 bugs in September alone (http://s.apache.org/mr-sept) and have another 50-ish bugs to go... we have at least 4 different organizations contributing patches to MRv2 in Sept alone: Yahoo, Hortonworks, LinkedIn & Huawei. > >Given where we are I'm confident we can have a strong hadoop-0.23.0 release by late October. The current plan is to deploy to alpha clusters in November. Citius, Altius, Fortius! :) > >Thanks to everyone who contributed, look forward to continued help. > >Arun > >PS: I'll continue to provide a periodic updates as we get closer to a hadoop-0.23.0 release.
-
Re: Update on hadoop-0.23Steve Loughran 2011-10-18, 09:36
On 17/10/11 18:17, Arun C Murthy wrote:
> Folks, > > Quick note - the dev community continues to scramble to get things wrapped up on hadoop-0.23. > > We are down to ~30 blockers and I hope to see them resolved over the next two weeks! > > Also, I feel Alejandro and Tom can finish up the remaining mavenization bits by then too - as I see it, it's very close... thanks guys! > > Once done, I plan to call a vote on a hadoop-0.23.0 which we can start deploying (and further stabilizing) right-away. > > My hope is that hadoop-0.23.0 is a strong alpha which we can then beat into shape after, the idea is to ship soon so we get folks to play with it and help downstream projects to integrate for e.g. Pig already works, and I know Todd is working on getting HBase to play well too. This is good, but I can see enough changes that we will need broad testing to confident there is no regression. -I propose that a "pre-alpha" is done ASAP, to test the release process and let people playing with YARN, the MR engine and writing tools to have something more stable than SNAPSHOT- to play with, then maybe a fast 2-4 cycle of alpha releases for a bit. -I can add the JIRA release numbers if you give me a list. -Where do you think the troublespots for deployment and regressions will be? -Anything that uses MiniMRCluster is going to go, and the migration strategy needs to be on the wiki (I can help there once I know what to do) -HBase, Hama, bigtop, MRUnit should all be pulled into the release process as part of the regression tests -It'd be good for people doing in-cluster tests to document cluster size, network config etc so we can identify what works & what doesn't though as that relies on people discussing their cluster details may be a bit patch. -HDFS migration; there really needs to be a way to test FS upgrades from various Hadoop versions, including Cloudera's -upgrades with entries in the edit log to replay
-
Re: Update on hadoop-0.23Steve Loughran 2011-10-18, 11:36
On 17/10/11 18:17, Arun C Murthy wrote:
One more thing: are the ProtocolBuffers needed for all installations, or is that a compile-time requirement? If the binaries are going to be required, there's going to have to be one built for the various platforms, and source.deb/RPM files to build themselves on Linux. I'd rather avoid all that work
-
Re: Update on hadoop-0.23Todd Lipcon 2011-10-18, 23:40
On Tue, Oct 18, 2011 at 4:36 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> > One more thing: are the ProtocolBuffers needed for all installations, or is > that a compile-time requirement? If the binaries are going to be required, > there's going to have to be one built for the various platforms, and > source.deb/RPM files to build themselves on Linux. I'd rather avoid all that > work The protobuf java jar is required at runtime. protoc (native) is only required at compile time. -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: Update on hadoop-0.23Harsh J 2011-10-19, 01:56
HBase trunk compiles with 0.23 now, after Todd's work on it -- I've
updated https://issues.apache.org/jira/browse/HBASE-4510 with further details. On Tue, Oct 18, 2011 at 1:57 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > On behalf of Harsh w.r.t. HBASE-4510 HDFS-1620 related changes downstream > (For compiling with HDFS > 0.23+)<https://issues.apache.org/jira/browse/HBASE-4510> > > I need to open up some HDFS jiras and change the way HBase uses safemode > determinism in the meanwhile. > > Cheers > > On Mon, Oct 17, 2011 at 10:17 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > >> Folks, >> >> Quick note - the dev community continues to scramble to get things wrapped >> up on hadoop-0.23. >> >> We are down to ~30 blockers and I hope to see them resolved over the next >> two weeks! >> >> Also, I feel Alejandro and Tom can finish up the remaining mavenization >> bits by then too - as I see it, it's very close... thanks guys! >> >> Once done, I plan to call a vote on a hadoop-0.23.0 which we can start >> deploying (and further stabilizing) right-away. >> >> My hope is that hadoop-0.23.0 is a strong alpha which we can then beat >> into shape after, the idea is to ship soon so we get folks to play with it >> and help downstream projects to integrate for e.g. Pig already works, and I >> know Todd is working on getting HBase to play well too. >> >> thanks, >> Arun >> >> > -- Harsh J
-
Re: Update on hadoop-0.23Steve Loughran 2011-10-19, 09:35
On 19/10/11 00:40, Todd Lipcon wrote:
> On Tue, Oct 18, 2011 at 4:36 AM, Steve Loughran<[EMAIL PROTECTED]> wrote: >> >> One more thing: are the ProtocolBuffers needed for all installations, or is >> that a compile-time requirement? If the binaries are going to be required, >> there's going to have to be one built for the various platforms, and >> source.deb/RPM files to build themselves on Linux. I'd rather avoid all that >> work > > The protobuf java jar is required at runtime. protoc (native) is only > required at compile time. OK, I've added notes on this in the wiki, please review and correct where I have fundamental misunderstandings http://wiki.apache.org/hadoop/ProtocolBuffers |