Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> development environment for hadoop core


Copy link to this message
-
Re: development environment for hadoop core
On Wed, Jan 16, 2013 at 7:31 AM, Glen Mazza <[EMAIL PROTECTED]> wrote:

> On 01/15/2013 06:50 PM, Erik Paulson wrote:
>
>> Hello -
>>
>> I'm curious what Hadoop developers use for their day-to-day hacking on
>> Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not
>> developing Map-Reduce jobs or using using the HDFS Client libraries to
>> talk
>> to a filesystem from an application.
>>
>> I've checked out Hadoop, made minor changes and built it with Maven, and
>> tracked down the resulting artifacts in a target/ directory that I could
>> deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works,
>> or
>> are the IDEs more common?
>>
> I haven't built Hadoop yet myself.  Your use of "a" in "a target/
> directory" indicates you're also kind of new with Maven itself, as that's
> the standard output folder for any Maven project.  One of many nice things
> about Maven is once you learn how to build one project with it you pretty
> much know how to build any project with it, as everything's standardized
> with it.
>
> Probably best to stick with the command line for building and use Eclipse
> for editing, to keep things simple, but don't forget the mvn
> eclipse:eclipse command to set up Eclipse projects that you can
> subsequently import into your Eclipse IDE: http://www.jroller.com/gmazza/*
> *entry/web_service_tutorial#**EclipseSetup<http://www.jroller.com/gmazza/entry/web_service_tutorial#EclipseSetup>
>
>
>
>> I realize this sort of sounds like a dumb question, but I'm mostly curious
>> what I might be missing out on if I stay away from anything other than
>> vim,
>> and not being entirely sure where maven might be caching jars that it uses
>> to build,
>>
>
> That will be your local Maven repository, in an .m2 hidden folder in your
> user home directory.
>
>
>
>  and how careful I have to be to ensure that my changes wind up in
>> the right places without having to do a clean build every time.
>>
>>
> Maven can detect changes (using mvn install instead of mvn clean install),
> but I prefer doing clean builds.  You can use the -Dmaven.test.skip setting
> to speed up your "mvn clean installs" if you don't wish to run the tests
> each time.
>

Thanks to everyone for their advice last week, it's been helpful.

You're spot-on that I'm new to Maven, but I'm a little confused as to what
the different targets/goals are best to use. Here's my scenario.

What I'd like to get working is the DataNodeCluster, which lives in the
tests.

Running it from hadoop-hdfs-project/hadoop-hdfs/target as
'hadoop jar ./hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar
org.apache.hadoop.hdfs.DataNodeCluster
-n 2'

blows up with a NPE inside of MiniDFSCluster - the offending line is
'dfsdir = conf.get(HDFS_MINIDFS_BASEDIR, null);' (line 2078 of
MiniDFSCluster.java)

I'm not worried about being able to figure out what's wrong (I'm pretty
sure it's that conf is still null when this gets called) - what I'm trying
to use this as is a way to understand what gets built when.

Just to check, I added a System.out.println one line before 2078 of
MiniDFSCluster, and recompiled from hadoop-common/hadoop-hdfs-project with

mvn package -DskipTests

Because I don't want to run all the tests.

This certainly compiles the codes - if I leave the semicolon off of my
change the compile fails, even with -DskipTests. However, it doesn't appear
to rebuild
target/hadoop-hdfs-3.0.0-SNAPSHOT/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar
- the timestamp is still the old version.

It _does_ copy
target/hadoop-hdfs-3.0.0-SNAPSHOT/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar
to target/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar, or at least otherwise
update the timestamp on target/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar (unless
it's copying or building it from somewhere else - but if it is, it's
picking up old versions of my code)

I only get an updated version if I ask for
mvn package -Pdist -DskipTests

Which is a 3 minute rebuild cycle, even for something as simple as changing
the text in my System.out.println. (Even just a mvn package -DskipTests
with no changes to any source code is a 40 second operation)

I haven't sat around and waited for 'mvn package' to run and fire off the
test suite. I don't know if that would result in an updated
hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar
being built.

So, my question is:

- Is there a better maven target to use if I just want to update code in
MiniDFSCluster.java and run DataNodeCluster, all of which wind up in
-tests.jar? ('better' here means a shorter build cycle. I'm a terrible
programmer so finding errors quickly is a priority for me :)
- is it worth being concerned that 'mvn package' on what should be a no-op
takes as long as it does?

I'll sort out the NPE in Datanodecluster and file appropriate JIRAs. (This
is all on the trunk - git show-ref is
2fc22342f44055ae4a2b526408de7524bf1f9215 HEAD, so the trunk as of last
Wednesday)

Thanks!

-Erik
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB