|
Erik Paulson
2013-01-15, 23:50
Todd Lipcon
2013-01-16, 01:44
Andy Isaacson
2013-01-16, 02:08
Surenkumar Nihalani
2013-01-16, 03:38
Steve Loughran
2013-01-16, 08:40
Glen Mazza
2013-01-16, 13:31
Erik Paulson
2013-01-21, 16:36
Colin McCabe
2013-01-21, 18:31
Gopal Vijayaraghavan
2013-01-16, 14:17
Hitesh Shah
2013-01-16, 19:18
|
-
development environment for hadoop coreErik Paulson 2013-01-15, 23:50
Hello -
I'm curious what Hadoop developers use for their day-to-day hacking on Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not developing Map-Reduce jobs or using using the HDFS Client libraries to talk to a filesystem from an application. I've checked out Hadoop, made minor changes and built it with Maven, and tracked down the resulting artifacts in a target/ directory that I could deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, or are the IDEs more common? I realize this sort of sounds like a dumb question, but I'm mostly curious what I might be missing out on if I stay away from anything other than vim, and not being entirely sure where maven might be caching jars that it uses to build, and how careful I have to be to ensure that my changes wind up in the right places without having to do a clean build every time. Thanks! -Erik +
Erik Paulson 2013-01-15, 23:50
-
Re: development environment for hadoop coreTodd Lipcon 2013-01-16, 01:44
Hi Erik,
When I started out on Hadoop development, I used to use emacs for most of my development. I eventually "saw the light" and switched to eclipse with a bunch of emacs keybindings - using an IDE is really handy in Java for functions like "find callers of", quick navigation to types, etc. etags gets you part of the way, but I'm pretty sold on eclipse at this point. The other big advantage I found of Eclipse is that the turnaround time on running tests is near-instant - make a change, hit save, and run a unit test in a second or two, instead of waiting 20+sec for maven (even on a non-clean build). That said, for quick fixes or remote debugging work I fall back to vim pretty quickly. -Todd On Tue, Jan 15, 2013 at 3:50 PM, Erik Paulson <[EMAIL PROTECTED]> wrote: > Hello - > > I'm curious what Hadoop developers use for their day-to-day hacking on > Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not > developing Map-Reduce jobs or using using the HDFS Client libraries to talk > to a filesystem from an application. > > I've checked out Hadoop, made minor changes and built it with Maven, and > tracked down the resulting artifacts in a target/ directory that I could > deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, or > are the IDEs more common? > > I realize this sort of sounds like a dumb question, but I'm mostly curious > what I might be missing out on if I stay away from anything other than vim, > and not being entirely sure where maven might be caching jars that it uses > to build, and how careful I have to be to ensure that my changes wind up in > the right places without having to do a clean build every time. > > Thanks! > > -Erik > -- Todd Lipcon Software Engineer, Cloudera +
Todd Lipcon 2013-01-16, 01:44
-
Re: development environment for hadoop coreAndy Isaacson 2013-01-16, 02:08
On Tue, Jan 15, 2013 at 3:50 PM, Erik Paulson <[EMAIL PROTECTED]> wrote:
> I'm curious what Hadoop developers use for their day-to-day hacking on > Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not > developing Map-Reduce jobs or using using the HDFS Client libraries to talk > to a filesystem from an application. > > I've checked out Hadoop, made minor changes and built it with Maven, and > tracked down the resulting artifacts in a target/ directory that I could > deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, or > are the IDEs more common? I use both vim and Eclipse (3.8.0~rc4-1 from Debian). I use git for version control with a branch per JIRA. Most testing is done with jUnit tests, I try to write a testcase to repro a bug before trying to fix the bug. Sometimes for a particular bug I need to install artifacts on a cluster (of VMs or physical machines) during the edit-compile-debug cycle; in such cases I build with mvn and carefully choose which artifacts need to be updated on the target cluster using rsync to speed up the cycle. It's pretty difficult to develop in Java without using Eclipse or similar. Like Todd I stuck to my preferred editor environment for several months but found the IDE crutch too useful to avoid entirely. Luckily nowadays Eclipse and vim synchronize through the filesystem pretty well (much better than 6-8 years ago); I haven't yet lost even a single line of code due to "oh you edited the same file in two editors and they overwrote one another"; both vim and Eclipse carefully say "It was changed on disk! Oh Noes! What shall we do?". You can run jUnit tests from either Eclipse or mvn, and I do both regularly. -andy +
Andy Isaacson 2013-01-16, 02:08
-
Re: development environment for hadoop coreSurenkumar Nihalani 2013-01-16, 03:38
I use Eclipse. I haven't figured out how to run and use mvn from it. I just use it as a editor. I have a git repo in commons/src. A branch for each jira. I rebase on branches to keep pulling in svn updates on branches.
On Jan 15, 2013, at 9:08 PM, Andy Isaacson <[EMAIL PROTECTED]> wrote: > On Tue, Jan 15, 2013 at 3:50 PM, Erik Paulson <[EMAIL PROTECTED]> wrote: >> I'm curious what Hadoop developers use for their day-to-day hacking on >> Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not >> developing Map-Reduce jobs or using using the HDFS Client libraries to talk >> to a filesystem from an application. >> >> I've checked out Hadoop, made minor changes and built it with Maven, and >> tracked down the resulting artifacts in a target/ directory that I could >> deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, or >> are the IDEs more common? > > I use both vim and Eclipse (3.8.0~rc4-1 from Debian). I use git for > version control with a branch per JIRA. Most testing is done with > jUnit tests, I try to write a testcase to repro a bug before trying to > fix the bug. Sometimes for a particular bug I need to install > artifacts on a cluster (of VMs or physical machines) during the > edit-compile-debug cycle; in such cases I build with mvn and carefully > choose which artifacts need to be updated on the target cluster using > rsync to speed up the cycle. > > It's pretty difficult to develop in Java without using Eclipse or > similar. Like Todd I stuck to my preferred editor environment for > several months but found the IDE crutch too useful to avoid entirely. > Luckily nowadays Eclipse and vim synchronize through the filesystem > pretty well (much better than 6-8 years ago); I haven't yet lost even > a single line of code due to "oh you edited the same file in two > editors and they overwrote one another"; both vim and Eclipse > carefully say "It was changed on disk! Oh Noes! What shall we do?". > > You can run jUnit tests from either Eclipse or mvn, and I do both regularly. > > -andy +
Surenkumar Nihalani 2013-01-16, 03:38
-
Re: development environment for hadoop coreSteve Loughran 2013-01-16, 08:40
My setup ( I work from home)
# OS/X laptop w/ 30" monitor # FTTC broadband, 55Mbit/s down, 15+ up -it's the upload bandwidth that really helps development: http://www.flickr.com/photos/steve_l/8050751551/ # IntelliJ IDEA IDE, settings edited for a 2GB Heap # Maven on the command line for builds # I run a "mvn install -DskipTests" every morning to ensure that apache's own -SNAPSHOT artifacts aren't pulled in. # CentOS 6.3 VM for doing the full binary build & test, making my own RPMs, etc. # coffee. One thing that annoys me is that I've got an airplay-driven hifi set up, and during builds there's enough CPU/RAM load that the music has dropouts. Whoever thought of streaming over UDP without an option for deeper buffering clearly doesn't use maven. What I am doing is moving my centos VM off the laptop and into rackspace cloud. That saves RAM for the IDE, and as I'm testing things in the same infrastructure, it gives me the ability to deploy artifacts at gigabit rates. I just use git as a way of syncing source. One thing I am debating -again on rackspace- is to set up Jenkins on yet-another-VM, polling aggressively, and automatically running the full test suite every half hour. That way, it does the full regression testing on all changes on my branch, while I focus on the one or two tests that I care about. That's something I discussed a way back https://docs.google.com/document/d/16v4SFYC6WSB-Y-B0Uo3IEhEhEN9pPYrtvhtR8KW9ouI/edit -it's only now that I'm sitting down and really doing it -git & github makes a difference as I can have my own personal branches for the CI tooling to play with Has anyone else tried anything like this On 16 January 2013 00:50, Erik Paulson <[EMAIL PROTECTED]> wrote: > Hello - > > I'm curious what Hadoop developers use for their day-to-day hacking on > Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not > developing Map-Reduce jobs or using using the HDFS Client libraries to talk > to a filesystem from an application. > > I've checked out Hadoop, made minor changes and built it with Maven, and > tracked down the resulting artifacts in a target/ directory that I could > deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, or > are the IDEs more common? > > I realize this sort of sounds like a dumb question, but I'm mostly curious > what I might be missing out on if I stay away from anything other than vim, > and not being entirely sure where maven might be caching jars that it uses > to build, and how careful I have to be to ensure that my changes wind up in > the right places without having to do a clean build every time. > > Thanks! > > -Erik > +
Steve Loughran 2013-01-16, 08:40
-
Re: development environment for hadoop coreGlen Mazza 2013-01-16, 13:31
On 01/15/2013 06:50 PM, Erik Paulson wrote:
> Hello - > > I'm curious what Hadoop developers use for their day-to-day hacking on > Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not > developing Map-Reduce jobs or using using the HDFS Client libraries to talk > to a filesystem from an application. > > I've checked out Hadoop, made minor changes and built it with Maven, and > tracked down the resulting artifacts in a target/ directory that I could > deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, or > are the IDEs more common? I haven't built Hadoop yet myself. Your use of "a" in "a target/ directory" indicates you're also kind of new with Maven itself, as that's the standard output folder for any Maven project. One of many nice things about Maven is once you learn how to build one project with it you pretty much know how to build any project with it, as everything's standardized with it. Probably best to stick with the command line for building and use Eclipse for editing, to keep things simple, but don't forget the mvn eclipse:eclipse command to set up Eclipse projects that you can subsequently import into your Eclipse IDE: http://www.jroller.com/gmazza/entry/web_service_tutorial#EclipseSetup > > I realize this sort of sounds like a dumb question, but I'm mostly curious > what I might be missing out on if I stay away from anything other than vim, > and not being entirely sure where maven might be caching jars that it uses > to build, That will be your local Maven repository, in an .m2 hidden folder in your user home directory. > and how careful I have to be to ensure that my changes wind up in > the right places without having to do a clean build every time. > Maven can detect changes (using mvn install instead of mvn clean install), but I prefer doing clean builds. You can use the -Dmaven.test.skip setting to speed up your "mvn clean installs" if you don't wish to run the tests each time. HTH, Glen > Thanks! > > -Erik > -- Glen Mazza Talend Community Coders - coders.talend.com blog: www.jroller.com/gmazza +
Glen Mazza 2013-01-16, 13:31
-
Re: development environment for hadoop coreErik Paulson 2013-01-21, 16:36
On Wed, Jan 16, 2013 at 7:31 AM, Glen Mazza <[EMAIL PROTECTED]> wrote:
> On 01/15/2013 06:50 PM, Erik Paulson wrote: > >> Hello - >> >> I'm curious what Hadoop developers use for their day-to-day hacking on >> Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not >> developing Map-Reduce jobs or using using the HDFS Client libraries to >> talk >> to a filesystem from an application. >> >> I've checked out Hadoop, made minor changes and built it with Maven, and >> tracked down the resulting artifacts in a target/ directory that I could >> deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, >> or >> are the IDEs more common? >> > I haven't built Hadoop yet myself. Your use of "a" in "a target/ > directory" indicates you're also kind of new with Maven itself, as that's > the standard output folder for any Maven project. One of many nice things > about Maven is once you learn how to build one project with it you pretty > much know how to build any project with it, as everything's standardized > with it. > > Probably best to stick with the command line for building and use Eclipse > for editing, to keep things simple, but don't forget the mvn > eclipse:eclipse command to set up Eclipse projects that you can > subsequently import into your Eclipse IDE: http://www.jroller.com/gmazza/* > *entry/web_service_tutorial#**EclipseSetup<http://www.jroller.com/gmazza/entry/web_service_tutorial#EclipseSetup> > > > >> I realize this sort of sounds like a dumb question, but I'm mostly curious >> what I might be missing out on if I stay away from anything other than >> vim, >> and not being entirely sure where maven might be caching jars that it uses >> to build, >> > > That will be your local Maven repository, in an .m2 hidden folder in your > user home directory. > > > > and how careful I have to be to ensure that my changes wind up in >> the right places without having to do a clean build every time. >> >> > Maven can detect changes (using mvn install instead of mvn clean install), > but I prefer doing clean builds. You can use the -Dmaven.test.skip setting > to speed up your "mvn clean installs" if you don't wish to run the tests > each time. > Thanks to everyone for their advice last week, it's been helpful. You're spot-on that I'm new to Maven, but I'm a little confused as to what the different targets/goals are best to use. Here's my scenario. What I'd like to get working is the DataNodeCluster, which lives in the tests. Running it from hadoop-hdfs-project/hadoop-hdfs/target as 'hadoop jar ./hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar org.apache.hadoop.hdfs.DataNodeCluster -n 2' blows up with a NPE inside of MiniDFSCluster - the offending line is 'dfsdir = conf.get(HDFS_MINIDFS_BASEDIR, null);' (line 2078 of MiniDFSCluster.java) I'm not worried about being able to figure out what's wrong (I'm pretty sure it's that conf is still null when this gets called) - what I'm trying to use this as is a way to understand what gets built when. Just to check, I added a System.out.println one line before 2078 of MiniDFSCluster, and recompiled from hadoop-common/hadoop-hdfs-project with mvn package -DskipTests Because I don't want to run all the tests. This certainly compiles the codes - if I leave the semicolon off of my change the compile fails, even with -DskipTests. However, it doesn't appear to rebuild target/hadoop-hdfs-3.0.0-SNAPSHOT/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar - the timestamp is still the old version. It _does_ copy target/hadoop-hdfs-3.0.0-SNAPSHOT/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar to target/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar, or at least otherwise update the timestamp on target/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar (unless it's copying or building it from somewhere else - but if it is, it's picking up old versions of my code) I only get an updated version if I ask for mvn package -Pdist -DskipTests Which is a 3 minute rebuild cycle, even for something as simple as changing the text in my System.out.println. (Even just a mvn package -DskipTests with no changes to any source code is a 40 second operation) I haven't sat around and waited for 'mvn package' to run and fire off the test suite. I don't know if that would result in an updated hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar being built. So, my question is: - Is there a better maven target to use if I just want to update code in MiniDFSCluster.java and run DataNodeCluster, all of which wind up in -tests.jar? ('better' here means a shorter build cycle. I'm a terrible programmer so finding errors quickly is a priority for me :) - is it worth being concerned that 'mvn package' on what should be a no-op takes as long as it does? I'll sort out the NPE in Datanodecluster and file appropriate JIRAs. (This is all on the trunk - git show-ref is 2fc22342f44055ae4a2b526408de7524bf1f9215 HEAD, so the trunk as of last Wednesday) Thanks! -Erik +
Erik Paulson 2013-01-21, 16:36
-
Re: development environment for hadoop coreColin McCabe 2013-01-21, 18:31
Hi Erik,
Eclipse can run junit tests very rapidly. If you want a shorter test cycle, that's one way to get it. There is also Maven-shell, which reduces some of the overhead of starting Maven. But I haven't used it so I can't really comment. cheers, Colin On Mon, Jan 21, 2013 at 8:36 AM, Erik Paulson <[EMAIL PROTECTED]> wrote: > On Wed, Jan 16, 2013 at 7:31 AM, Glen Mazza <[EMAIL PROTECTED]> wrote: > > > On 01/15/2013 06:50 PM, Erik Paulson wrote: > > > >> Hello - > >> > >> I'm curious what Hadoop developers use for their day-to-day hacking on > >> Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not > >> developing Map-Reduce jobs or using using the HDFS Client libraries to > >> talk > >> to a filesystem from an application. > >> > >> I've checked out Hadoop, made minor changes and built it with Maven, and > >> tracked down the resulting artifacts in a target/ directory that I could > >> deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, > >> or > >> are the IDEs more common? > >> > > I haven't built Hadoop yet myself. Your use of "a" in "a target/ > > directory" indicates you're also kind of new with Maven itself, as that's > > the standard output folder for any Maven project. One of many nice > things > > about Maven is once you learn how to build one project with it you pretty > > much know how to build any project with it, as everything's standardized > > with it. > > > > Probably best to stick with the command line for building and use Eclipse > > for editing, to keep things simple, but don't forget the mvn > > eclipse:eclipse command to set up Eclipse projects that you can > > subsequently import into your Eclipse IDE: > http://www.jroller.com/gmazza/* > > *entry/web_service_tutorial#**EclipseSetup< > http://www.jroller.com/gmazza/entry/web_service_tutorial#EclipseSetup> > > > > > > > >> I realize this sort of sounds like a dumb question, but I'm mostly > curious > >> what I might be missing out on if I stay away from anything other than > >> vim, > >> and not being entirely sure where maven might be caching jars that it > uses > >> to build, > >> > > > > That will be your local Maven repository, in an .m2 hidden folder in your > > user home directory. > > > > > > > > and how careful I have to be to ensure that my changes wind up in > >> the right places without having to do a clean build every time. > >> > >> > > Maven can detect changes (using mvn install instead of mvn clean > install), > > but I prefer doing clean builds. You can use the -Dmaven.test.skip > setting > > to speed up your "mvn clean installs" if you don't wish to run the tests > > each time. > > > > Thanks to everyone for their advice last week, it's been helpful. > > You're spot-on that I'm new to Maven, but I'm a little confused as to what > the different targets/goals are best to use. Here's my scenario. > > What I'd like to get working is the DataNodeCluster, which lives in the > tests. > > Running it from hadoop-hdfs-project/hadoop-hdfs/target as > 'hadoop jar ./hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar > org.apache.hadoop.hdfs.DataNodeCluster > -n 2' > > blows up with a NPE inside of MiniDFSCluster - the offending line is > 'dfsdir = conf.get(HDFS_MINIDFS_BASEDIR, null);' (line 2078 of > MiniDFSCluster.java) > > I'm not worried about being able to figure out what's wrong (I'm pretty > sure it's that conf is still null when this gets called) - what I'm trying > to use this as is a way to understand what gets built when. > > Just to check, I added a System.out.println one line before 2078 of > MiniDFSCluster, and recompiled from hadoop-common/hadoop-hdfs-project with > > mvn package -DskipTests > > Because I don't want to run all the tests. > > This certainly compiles the codes - if I leave the semicolon off of my > change the compile fails, even with -DskipTests. However, it doesn't appear > to rebuild > > target/hadoop-hdfs-3.0.0-SNAPSHOT/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar > - the timestamp is still the old version. +
Colin McCabe 2013-01-21, 18:31
-
Re: development environment for hadoop coreGopal Vijayaraghavan 2013-01-16, 14:17
Not quite an advance developer, but I learnt some shortcuts for my dev
cycle along the way. > I've checked out Hadoop, made minor changes and built it with Maven, and > tracked down the resulting artifacts in a target/ directory that I could > deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, or > are the IDEs more common? I mostly stuck to vim for my editor, with a few exceptions (Eclipse is great for browsing class to class) & mvn eclipse:eclipse works great. I end up doing mvn package -Pdist That give you a hadoop-dist/target/hadoop-${version} to work from >From then on, the mini-cluster is your friend. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CLIMiniCluster.html all I usually specify is -rmport 8032 Next thing I learnt was that for most of dev work, file:/// works great instead of hdfs for instance in hive, I could just give -hiveconf fs.default.name=file://$(FS)/ -hiveconf hive.metastore.warehouse.dir=file://$(FS)/warehouse (of course, substituting FS for something useful like /tmp/hive/) and run my queries without worrying about HDFS overheads. Using file:/// urls for map input and output occasionally simplifies your debugging a lot. So basically, you could run ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount file:///usr/share/dict/words file:///tmp/run1 Or you could just use localhost:9000 in the minicluster if you really want to test out the HDFS client ops. Figuring out how to run hadoop in the non-cluster mode has been the most produtivity inducing thing I've learnt. Hope that helps. > I realize this sort of sounds like a dumb question, but I'm mostly curious > what I might be missing out on if I stay away from anything other than vim, > and not being entirely sure where maven might be caching jars that it uses > to build, and how careful I have to be to ensure that my changes wind up in > the right places without having to do a clean build every time. find ~/.m2/ helps a bit, but occasionally when I do break the API of something basic like Writable, I want to use my version of the hadoop libs for that project. So, this is a question I have for everyone else. How do I change the hadoop version of an entire build, so that I can name it something unique & use it in other builds in maven (-SNAPSHOT doesn't cut it, since occasionally mvn will download the hadoop snap poms from the remote repos). Cheers, Gopal +
Gopal Vijayaraghavan 2013-01-16, 14:17
-
Re: development environment for hadoop coreHitesh Shah 2013-01-16, 19:18
On Jan 16, 2013, at 6:17 AM, Gopal Vijayaraghavan wrote: > So, this is a question I have for everyone else. > > How do I change the hadoop version of an entire build, so that I can > name it something unique & use it in other builds in maven (-SNAPSHOT > doesn't cut it, since occasionally mvn will download the hadoop snap > poms from the remote repos). > The following should work: ( from http://wiki.apache.org/hadoop/HowToReleasePostMavenization) $ export version=3.0.0-TEST1 $ mvn versions:set -DnewVersion=${version} -- Hitesh +
Hitesh Shah 2013-01-16, 19:18
|