You are correct about your idea of clients. To talk to HDFS, they need
to be allowed to talk to the NN's ports as well as the DN's ports. To
talk to YARN/MR, they need access to both RM and NM ports (as well as
the JobHistoryServer's web port).
Aside of just a local install, they'll also need the install
configured to point to the cluster's URLs.
Regarding 0.23 or 2.0.3, you can choose either. The 0.23 is a
fast-moving one right now, as stability improvements under YARN and
MR2 are continuously being added and released (majorly by and for use
at Yahoo! as well). The 2.0.x has a slightly wider release period with
new features and possible incompatibilities still coming in (until it
hits beta) and carries HDFS-HA features in it, plus protobuf-based
protocols (which 0.23 lacks). Eventually, the 0.23 will stop and move
over to 2.x once the latter finally stabilizes.
But upgrade-wise, you can do both 1.x -> 2.x or 1.x -> 0.23.x (For
now, until it lasts) -> 2.x; both routes are supported.
On Wed, Mar 20, 2013 at 11:28 PM, Marcel Mitsuto F. S.
<[EMAIL PROTECTED]> wrote:
> I'm starting a project to build a 10 node cluster grid.
> I've already successfully built a 10 node grid with hadoop 1.0.4.
> This next grid would preferrably be the 0.23.X branch, which I think would
> be the best version to smoothly transition to 2.0.3 release (right?)
> When I was working with the 1.0.4 proof-of-concept, I was scratching my head
> about the 'clients' role that submits jobs to the cluster, all the work then
> of `hadoop fs -put` I was doing directly from namenode instance.
> So the question: How do I setup a grid where clients could send jobs to the
> cluster in a queued fashion way, and how to setup the 'clients' to properly
> being acknowledged by the grid and being able to send jobs? Am I correct to
> think that 'client' could be anyone (my laptop in the network that reaches
> namenode) with access to the cluster with hadoop installed locally?
> Thanks in advance.