Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> Question regarding access to different hadoop 2.0 cluster


+
lohit 2013-11-04, 22:02
+
Suresh Srinivas 2013-11-04, 22:12
+
lohit 2013-11-04, 22:15
+
Bobby Evans 2013-11-05, 14:57
+
Suresh Srinivas 2013-11-05, 22:57
+
Bobby Evans 2013-11-06, 15:36
Copy link to this message
-
Re: Question regarding access to different hadoop 2.0 cluster
We've discussed a few times adding a FailoverProxyProvider which would use
DNS records for this. For example, you'd add a SRV record (or multiple A
records) for the logical name, pointing to the physical hosts backing the
cluster. I think it would help reduce client-side configuration pretty
neatly, though has the disadvantage that your DNS admins need to get in the
loop.

-Todd
On Wed, Nov 6, 2013 at 7:36 AM, Bobby Evans <[EMAIL PROTECTED]> wrote:

> Suresh,
>
> You are correct I did not explain myself very well. If one of the name
> nodes has hardware failure.  In order to avoid updating the configs for
> every single service that talks to HDFS you have to make sure the
> replacement box appears to the network to be exactly the same as the
> original.  This is not impossible as you mentioned.
>
> The more common case when this is problematic is upgrading clusters from
> non-HA to HA, or adding in new HA clusters, because there is no existing
> IP address/config to be copied.  Every time this happens all existing
> services must have new configs pushed to be able to talk to the
> new/updated HDFS. This includes Gateways, RM, Compute Nodes, Oozie
> Servers, etc.
>
> Again, this is not that big of a deal for a small setup, but for a large
> setup it can be painful.
>
> --Bobby
>
> On 11/5/13 4:57 PM, "Suresh Srinivas" <[EMAIL PROTECTED]> wrote:
>
> >On Tue, Nov 5, 2013 at 6:57 AM, Bobby Evans <[EMAIL PROTECTED]> wrote:
> >
> >> But that does present a problem if you have to change the DNS address of
> >> one of the HA namenodes.
> >
> >
> >Not sure what you mean by this? Do you mean hostname of one of the
> >namenode
> >changes?
> >If so, why is this is not a problem for single namenode deployment?. How
> >do
> >applications
> >addressing a namenode in a different cluster handle the change?
> >
> >
> >> It forces you to update the config on all other
> >> clusters that want to talk to it.  If you only have a few clusters that
> >>is
> >> probably not a big deal, but it can be problematic if you have many
> >> different clusters that talk to each other.
> >>
> >> --Bobby
> >>
> >> On 11/4/13 4:15 PM, "lohit" <[EMAIL PROTECTED]> wrote:
> >>
> >> >Thanks Suresh!
> >> >
> >> >
> >> >2013/11/4 Suresh Srinivas <[EMAIL PROTECTED]>
> >> >
> >> >> Lohit,
> >> >>
> >> >> The option you have enumerated at the end is the current way to set
> >>up
> >> >> multi cluster
> >> >> environment. That is, all the client side configurations will include
> >> >>the
> >> >> following:
> >> >> - Logical service names (either for federation or HA)
> >> >> - The corresponding physical namenode addresses information
> >> >>
> >> >> For simpler management, one could use xml include to include an xml
> >> >> document
> >> >> that defines all the namespaces and namenodes.
> >> >>
> >> >> Regards,
> >> >> Suresh
> >> >>
> >> >>
> >> >> On Mon, Nov 4, 2013 at 2:02 PM, lohit <[EMAIL PROTECTED]>
> >> >>wrote:
> >> >>
> >> >> > Hello Devs,
> >> >> >
> >> >> > With hadoop 1.0 when there was single namespace. One could access
> >>any
> >> >> HDFS
> >> >> > cluster using any other hadoop config. Something like this
> >> >> >
> >> >> > hadoop --config /path/to/hadoop-cluster1
> >>hdfs://hadoop-cluster2:8020/
> >> >> >
> >> >> > Since NameNode host and port were passed directly as part of URI,
> >>if
> >> >>hdfs
> >> >> > client version matched, one could talk to different clusters
> >>without
> >> >> > needing to have access to cluster specific configuration.
> >> >> >
> >> >> > With Hadoop 2.0 or HA mode, we only specify logical name for
> >>namenode
> >> >>and
> >> >> > rely on hdfs-site.xml  to resolve logical name to two underlying
> >> >>namenode
> >> >> > hosts.
> >> >> >
> >> >> > So, you cannot do something like
> >> >> > hadoop --config /path/to/hadoop-cluster1
> >> >> > hdfs://hadoop-cluster2-logicalname/
> >> >> >
> >> >> > since /path/to/hadoop-cluster1/hdfs-site.xml do not have
> >>information
> >> >> about
> >> >> > hadoop-cluster2-logicalname's namenodes.
Todd Lipcon
Software Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB