Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # dev >> How to run hadoop jar command in a clustered environment


+
Thoihen Maibam 2013-04-15, 16:36
+
Chris Nauroth 2013-04-15, 17:32
Copy link to this message
-
Re: How to run hadoop jar command in a clustered environment
@Chris thanks a lot that helped a lot.
On Mon, Apr 15, 2013 at 11:02 PM, Chris Nauroth <[EMAIL PROTECTED]>wrote:

> Hello Thoihen,
>
> I'm moving this discussion from common-dev (questions about developing
> Hadoop) to user (questions about using Hadoop).
>
> If you haven't already seen it, then I recommend reading the cluster setup
> documentation.  It's a bit different depending on the version of the Hadoop
> code that you're deploying and running.  You mentioned JobTracker, so I
> expect that you're using something from the 1.x line, but here are links to
> both 1.x and 2.x docs just in case:
>
> 1.x: http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html
> 2.x/trunk:
>
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html
>
> To address your specific questions:
>
> 1. You can run the hadoop jar command and submit MapReduce jobs from any
> machine that has the Hadoop software and configuration deployed and has
> network connectivity to the machines that make up the Hadoop cluster.
>
> 2. Yes, you can use a separate machine that is not a member of the cluster
> (meaning it does not run Hadoop daemons like DataNode, TaskTracker, or
> NodeManager).  This is your choice.  I've found it valuable to isolate
> nodes like this to prevent MR job tasks from taking processing resources
> away from interactive user commands, but this does mean that the resources
> on that node can't be utilized by MR jobs during user idle times, so it
> causes a small hit to overall utilization.
>
> Hope this helps,
> --Chris
>
>
> On Mon, Apr 15, 2013 at 9:36 AM, Thoihen Maibam <[EMAIL PROTECTED]
> >wrote:
>
> > Hi All,
> >
> > I am really new to Hadoop and installed hadoop in my local ubuntu
> machine.
> > I also created a wordcount.jar and started hadoop with start-all.sh which
> > started all the hadoop daemons and used jps to confirm it. Cd to
> hadoop/bin
> > and ran hadoop jar x.jar  and successfully ran the map reduce program.
> >
> > Now, can someone please help me how I should run the hadoop jar command
> > over a clustered environment say for example a cluster with 50 nodes. I
> > know a dedicated machine would be namenode and another jobtracker and
> other
> > datanodes and tasktrackers.
> >
> > 1. From which machine should I run the hadoop jar command considering I
> > have a mapreduce jar in hand. Is it the jobtracker machine from where I
> > should run this hadoop jar command or can I run this hadoop jar command
> > from any machine in the cluster.
> >
> > 2, Can I run the map reduce job from another machine which is not part of
> > the cluster , if yes how should I do it.
> >
> > Please help me.
> >
> > Regards
> > thoihen
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB