Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Multidata center support

Copy link to this message
Re: Multidata center support
Hi friends

hello baskar i think rack awareness and data center awareness are different
and similarly nodes and data centers are different things from hadoops
perspective but ideally it shud be same i mean nodes can be in different
data centers right but i think hadoop doesnt not  replicate data across
data centers i am not sure abt this (can anyone please comment on this).....

federation can provide different namenodes so you can create independent
clusters for example one cluster at one data center
and another cluster at a different  data center..... but if hadoop can
replicate across data centers then we need only one federation cluster for
all data centers :)...are any of you guys using a single federation cluster
across multiple data centers in production  for example


one cluster federation/data centers at  US/Europe---------------(if hadoop
can replicate across data centers )

NN1 ------US       DN1 ----US
NN2 ------Europe DN2 -----Europe

In this case data can be replicated to DN1 and DN2


two independent cluster federation/data centers at
 US/Europe--------------(if hadoop cannot replicate across data centers )

cluster 1
cluster 2

NN1 ------US       DN1 ----US                                  NN2
------Europe DN2 -----Europe
In this case data cannot be replicated to DN2 or vice versa
*Can anyone clarify which will be the right and optimal case for hadoop

On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <

> Rahul
> Are you talking about rack-awareness script?
> I did go through rack awareness. Here are the problems with rack awareness
> w.r.to my (given) "business requirment"
> 1.  Hadoop , default places two copies on the same rack and 1 copy on some
> other rack.  This would work as long as we have two data centers. if
> business wants to have three data centers, then data would not be spread
> across. Separately there is a question around whether it is the right thing
> to do or not. I have been promised by business that they would buy enough
> bandwidth such that each data center will be few milliseconds apart (in
> latency).
> 2. I believe Hadoop automatically re-replicates data if one or more node
> is down. Assume when one out of 2 data center goes down. There will be a
> massive data flow to create additional copies.  When I say data center
> support, I should be able to configure hadoop to say
>      a) Maintain 1 copy per data center
>      b) If any data center goes down, dont create additional copies.
> Above requirements that I am pointing will essentially move hadoop from
> strongly consistent to a week/eventual consistent model. Since this changes
> fundamental architecture, it will probably break all sort of things...
> Might not be possible ever in Hadoop.
> Thoughts?
> Sadak
> Is there a way to implement above requirement via Federation?
> Thanks
> Baskar
> ------------------------------
> Date: Sun, 1 Sep 2013 00:20:04 +0530
> Subject: Re: Multidata center support
> What do you think friends I think hadoop clusters can run on multiple data
> centers using FEDERATION
> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <[EMAIL PROTECTED]>wrote:
> The only problem i guess hadoop wont be able to duplicate data from one
> data center to another but i guess i can identify data nodes or namenodes
> from another data center correct me if i am wrong
> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <[EMAIL PROTECTED]>wrote: