Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - development environment for hadoop core


Copy link to this message
-
Re: development environment for hadoop core
Gopal Vijayaraghavan 2013-01-16, 14:17
Not quite an advance developer, but I learnt some shortcuts for my dev
cycle along the way.

> I've checked out Hadoop, made minor changes and built it with Maven, and
> tracked down the resulting artifacts in a target/ directory that I could
> deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, or
> are the IDEs more common?

I mostly stuck to vim for my editor, with a few exceptions (Eclipse is
great for
browsing class to class) & mvn eclipse:eclipse works great.

I end up doing mvn package -Pdist

That give you a hadoop-dist/target/hadoop-${version} to work from

>From then on, the mini-cluster is your friend.

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CLIMiniCluster.html

all I usually specify is  -rmport 8032

Next thing I learnt was that for most of dev work, file:/// works
great instead of hdfs

for instance in hive, I could just give

-hiveconf fs.default.name=file://$(FS)/
-hiveconf hive.metastore.warehouse.dir=file://$(FS)/warehouse

(of course, substituting FS for something useful like /tmp/hive/)

and run my queries without worrying about HDFS overheads.

Using file:/// urls for map input and output occasionally simplifies
your debugging a lot.

So basically, you could run

./bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount
file:///usr/share/dict/words file:///tmp/run1

Or you could just use localhost:9000 in the minicluster if you really
want to test out the HDFS client ops.

Figuring out how to run hadoop in the non-cluster mode has been the
most produtivity inducing thing I've learnt.

Hope that helps.

> I realize this sort of sounds like a dumb question, but I'm mostly curious
> what I might be missing out on if I stay away from anything other than vim,
> and not being entirely sure where maven might be caching jars that it uses
> to build, and how careful I have to be to ensure that my changes wind up in
> the right places without having to do a clean build every time.

find ~/.m2/ helps a bit, but occasionally when I do break the API of
something basic like Writable, I want to use my version of the hadoop
libs for that project.

So, this is a question I have for everyone else.

How do I change the hadoop version of an entire build, so that I can
name it something unique & use it in other builds in maven (-SNAPSHOT
doesn't cut it, since occasionally mvn will download the hadoop snap
poms from the remote repos).

Cheers,
Gopal