Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> development environment for hadoop core


Copy link to this message
-
Re: development environment for hadoop core
Not quite an advance developer, but I learnt some shortcuts for my dev
cycle along the way.

> I've checked out Hadoop, made minor changes and built it with Maven, and
> tracked down the resulting artifacts in a target/ directory that I could
> deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, or
> are the IDEs more common?

I mostly stuck to vim for my editor, with a few exceptions (Eclipse is
great for
browsing class to class) & mvn eclipse:eclipse works great.

I end up doing mvn package -Pdist

That give you a hadoop-dist/target/hadoop-${version} to work from

>From then on, the mini-cluster is your friend.

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CLIMiniCluster.html

all I usually specify is  -rmport 8032

Next thing I learnt was that for most of dev work, file:/// works
great instead of hdfs

for instance in hive, I could just give

-hiveconf fs.default.name=file://$(FS)/
-hiveconf hive.metastore.warehouse.dir=file://$(FS)/warehouse

(of course, substituting FS for something useful like /tmp/hive/)

and run my queries without worrying about HDFS overheads.

Using file:/// urls for map input and output occasionally simplifies
your debugging a lot.

So basically, you could run

./bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount
file:///usr/share/dict/words file:///tmp/run1

Or you could just use localhost:9000 in the minicluster if you really
want to test out the HDFS client ops.

Figuring out how to run hadoop in the non-cluster mode has been the
most produtivity inducing thing I've learnt.

Hope that helps.

> I realize this sort of sounds like a dumb question, but I'm mostly curious
> what I might be missing out on if I stay away from anything other than vim,
> and not being entirely sure where maven might be caching jars that it uses
> to build, and how careful I have to be to ensure that my changes wind up in
> the right places without having to do a clean build every time.

find ~/.m2/ helps a bit, but occasionally when I do break the API of
something basic like Writable, I want to use my version of the hadoop
libs for that project.

So, this is a question I have for everyone else.

How do I change the hadoop version of an entire build, so that I can
name it something unique & use it in other builds in maven (-SNAPSHOT
doesn't cut it, since occasionally mvn will download the hadoop snap
poms from the remote repos).

Cheers,
Gopal
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB