-Re: Developing and deploying hadoop
Eric Yang 2012-03-01, 04:57
a) Standard practice is to keep data directory independent of program
directory. For example, if the software is installed in
/opt/hadoop/hadoop-1.0. Data may be located in /var/hadoop. When new
version is available for deployment, it can be deployed to
/opt/hadoop/hadoop-2.0 and use the same /var/hadoop directory for
b) It is best to use "ant binary" with a dozen other switches that are
documented in Hadoop wiki, http://wiki.apache.org/hadoop/HowToRelease.
This reduces the size of the program files without having to deploy
documentation and source on all nodes.
c) There are a couple deployment systems, like Ambari, Cloudera
Manager, HMS and IBM BigInsights. Most of them are free to use for up
to 50 nodes. pdsh with shell scripts works too, in fact the largest
clusters are deployed with ssh and scp.
d) PREFIX/share/hadoop was introduced in 0.20.204.0. The design was
to map closely to Filesystem Hierarchy Standard, where platform
independent files are stored in /usr/share. This design enables
dependent project to cross reference classpath by using relative path.
For example, HBase may refer to hadoop jar files by sourcing
PREFIX/share/hadoop/*.jar. Some projects have adopt this design, and
we hope more projects will switch to this convention.
On Wed, Feb 29, 2012 at 6:56 PM, Merto Mertek <[EMAIL PROTECTED]> wrote:
> I would be glad to hear what is your development cycle and how you deploy
> new features to production cluster... How do you deploy them to the
> production cluster? With bash scripts and rsync, ant, maven or any other
> automation tool? I would be thankfull if you could point me to any resource
> describing best practices in developing, deploying and automatization of
> java project in unix/linux environment..
> On 13 February 2012 11:26, Merto Mertek <[EMAIL PROTECTED]> wrote:
>> I am interested in some general tips on how to develop and deploy new
>> versions of hadoop. I've been trying to compile a new version of hadoop
>> and place the new jar to the cluster in the lib folder, however it was not
>> picked despite the classpath was explicitly set to the lib folder. I am
>> interested in the following questions:
>> a) How to deploy a new version? Just copy the new compiled jar file to all
>> lib folders on all nodes?
>> b) Should I make just a new compile or a new release ('ant' vs 'ant tar')?
>> c) How do you develop and deploy hadoop locally and how remotely? For
>> deploying builds are you using your own sh scripts or are you using any
>> tools like ant/maven?
>> d) What is the purpose of the folder $HADOOP_HOME/share/hadoop?
>> Any other tips are welcomed..
>> Thank you