Thoihen Maibam 2013-04-15, 16:36
Chris Nauroth 2013-04-15, 17:32
-Re: How to run hadoop jar command in a clustered environment
maisnam ns 2013-04-15, 18:07
@Chris thanks a lot that helped a lot.
On Mon, Apr 15, 2013 at 11:02 PM, Chris Nauroth <[EMAIL PROTECTED]>wrote:
> Hello Thoihen,
> I'm moving this discussion from common-dev (questions about developing
> Hadoop) to user (questions about using Hadoop).
> If you haven't already seen it, then I recommend reading the cluster setup
> documentation. It's a bit different depending on the version of the Hadoop
> code that you're deploying and running. You mentioned JobTracker, so I
> expect that you're using something from the 1.x line, but here are links to
> both 1.x and 2.x docs just in case:
> 1.x: http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html
> To address your specific questions:
> 1. You can run the hadoop jar command and submit MapReduce jobs from any
> machine that has the Hadoop software and configuration deployed and has
> network connectivity to the machines that make up the Hadoop cluster.
> 2. Yes, you can use a separate machine that is not a member of the cluster
> (meaning it does not run Hadoop daemons like DataNode, TaskTracker, or
> NodeManager). This is your choice. I've found it valuable to isolate
> nodes like this to prevent MR job tasks from taking processing resources
> away from interactive user commands, but this does mean that the resources
> on that node can't be utilized by MR jobs during user idle times, so it
> causes a small hit to overall utilization.
> Hope this helps,
> On Mon, Apr 15, 2013 at 9:36 AM, Thoihen Maibam <[EMAIL PROTECTED]
> > Hi All,
> > I am really new to Hadoop and installed hadoop in my local ubuntu
> > I also created a wordcount.jar and started hadoop with start-all.sh which
> > started all the hadoop daemons and used jps to confirm it. Cd to
> > and ran hadoop jar x.jar and successfully ran the map reduce program.
> > Now, can someone please help me how I should run the hadoop jar command
> > over a clustered environment say for example a cluster with 50 nodes. I
> > know a dedicated machine would be namenode and another jobtracker and
> > datanodes and tasktrackers.
> > 1. From which machine should I run the hadoop jar command considering I
> > have a mapreduce jar in hand. Is it the jobtracker machine from where I
> > should run this hadoop jar command or can I run this hadoop jar command
> > from any machine in the cluster.
> > 2, Can I run the map reduce job from another machine which is not part of
> > the cluster , if yes how should I do it.
> > Please help me.
> > Regards
> > thoihen