-Re: development environment for hadoop core
Erik Paulson 2013-01-21, 16:36
On Wed, Jan 16, 2013 at 7:31 AM, Glen Mazza <[EMAIL PROTECTED]> wrote:
> On 01/15/2013 06:50 PM, Erik Paulson wrote:
>> Hello -
>> I'm curious what Hadoop developers use for their day-to-day hacking on
>> Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not
>> developing Map-Reduce jobs or using using the HDFS Client libraries to
>> to a filesystem from an application.
>> I've checked out Hadoop, made minor changes and built it with Maven, and
>> tracked down the resulting artifacts in a target/ directory that I could
>> deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works,
>> are the IDEs more common?
> I haven't built Hadoop yet myself. Your use of "a" in "a target/
> directory" indicates you're also kind of new with Maven itself, as that's
> the standard output folder for any Maven project. One of many nice things
> about Maven is once you learn how to build one project with it you pretty
> much know how to build any project with it, as everything's standardized
> with it.
> Probably best to stick with the command line for building and use Eclipse
> for editing, to keep things simple, but don't forget the mvn
> eclipse:eclipse command to set up Eclipse projects that you can
> subsequently import into your Eclipse IDE: http://www.jroller.com/gmazza/*
>> I realize this sort of sounds like a dumb question, but I'm mostly curious
>> what I might be missing out on if I stay away from anything other than
>> and not being entirely sure where maven might be caching jars that it uses
>> to build,
> That will be your local Maven repository, in an .m2 hidden folder in your
> user home directory.
> and how careful I have to be to ensure that my changes wind up in
>> the right places without having to do a clean build every time.
> Maven can detect changes (using mvn install instead of mvn clean install),
> but I prefer doing clean builds. You can use the -Dmaven.test.skip setting
> to speed up your "mvn clean installs" if you don't wish to run the tests
> each time.
Thanks to everyone for their advice last week, it's been helpful.
You're spot-on that I'm new to Maven, but I'm a little confused as to what
the different targets/goals are best to use. Here's my scenario.
What I'd like to get working is the DataNodeCluster, which lives in the
Running it from hadoop-hdfs-project/hadoop-hdfs/target as
'hadoop jar ./hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar
blows up with a NPE inside of MiniDFSCluster - the offending line is
'dfsdir = conf.get(HDFS_MINIDFS_BASEDIR, null);' (line 2078 of
I'm not worried about being able to figure out what's wrong (I'm pretty
sure it's that conf is still null when this gets called) - what I'm trying
to use this as is a way to understand what gets built when.
Just to check, I added a System.out.println one line before 2078 of
MiniDFSCluster, and recompiled from hadoop-common/hadoop-hdfs-project with
mvn package -DskipTests
Because I don't want to run all the tests.
This certainly compiles the codes - if I leave the semicolon off of my
change the compile fails, even with -DskipTests. However, it doesn't appear
- the timestamp is still the old version.
It _does_ copy
to target/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar, or at least otherwise
update the timestamp on target/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar (unless
it's copying or building it from somewhere else - but if it is, it's
picking up old versions of my code)
I only get an updated version if I ask for
mvn package -Pdist -DskipTests
Which is a 3 minute rebuild cycle, even for something as simple as changing
the text in my System.out.println. (Even just a mvn package -DskipTests
with no changes to any source code is a 40 second operation)
I haven't sat around and waited for 'mvn package' to run and fire off the
test suite. I don't know if that would result in an updated
So, my question is:
- Is there a better maven target to use if I just want to update code in
MiniDFSCluster.java and run DataNodeCluster, all of which wind up in
-tests.jar? ('better' here means a shorter build cycle. I'm a terrible
programmer so finding errors quickly is a priority for me :)
- is it worth being concerned that 'mvn package' on what should be a no-op
takes as long as it does?
I'll sort out the NPE in Datanodecluster and file appropriate JIRAs. (This
is all on the trunk - git show-ref is
2fc22342f44055ae4a2b526408de7524bf1f9215 HEAD, so the trunk as of last