In larger clusters it is better to have an edge/client node where all the user jars reside and you trigger your MR jobs from here.
A client/edge node is a server with hadoop jars and conf but hosting no daemons.
In smaller clusters one DN might act as the client node and you can execute your jars from there. Here you have a risk of that DN getting filled if the files are copied to hdfs from this DN (as per block placement policy one replica would always be on this node)
In oozie you put your executables into hdfs . But oozie comes at an integration level. In initial development phase, developers put jar into the LFS on client node, execute and test their code.
Sent from remote device, Please excuse typos
From: Chris Embree <[EMAIL PROTECTED]>
Date: Tue, 22 Jan 2013 14:24:40
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Where do/should .jar files live?
This should be a simple question, I think. Disclosure, I am not a java
We're getting ready to build our Dev and Prod clusters. I'm pretty
comfortable with HDFS and how it sits atop several local file systems on
multiple servers. I'm fairly comfortable with the concept of Map/Reduce
and why it's cool and we want it.
Now for the question. Where should my developers, put and store their jar
files? Or asked another way, what's the best entry point for submitting
We have separate physical systems for NN, Checkpoint Node (formerly 2nn),
Job Tracker and Standby NN. Should I run from the JT node? Do I keep all
of my finished .jar's on the JT local file system?
Or should I expect that jobs will be run via Oozie? Do I put jars on the
local Oozie FS?
Thanks in advance.