it's depend of the number of replicas and the Hadoop rack configuration
It's possible to have replicas on the two datacenters.
What's the rack configuration that you plan ? You can implement your
own one and define it using the topology.node.switch.mapping.impl
On 2013-07-15 23:49, Niels Basjes wrote:
> Last week we had a discussion at work regarding setting up our new
> Hadoop cluster(s).
> One of the things that has changed is that the importance of the
> Hadoop stack is growing so we want to be "more available".
> One of the points we talked about was setting up the cluster in such
> way that the nodes are physically located in two separate datacenters
> (on opposite sides of the same city) with a big network connection in
> Were currently talking about a cluster in the 50 nodes range, but
> will grow over time.
> The advantages I see:
> - More CPU power available for jobs.
> - The data is automatically copied between the datacenters as long as
> we configure them to be different racks.
> The disadvantages I see:
> - If the network goes out then one half is dead and the other half
> will most likely go to safemode because the recovering of the missing
> replicas will fill up the disks fast.
> What things should we consider also?
> Has anyone any experience with such a setup?
> Is it a good idea to do this?
> What are better options for us to consider?
> Thanks for any input.