-Re: Multidata center support
Rahul Bhattacharjee 2013-09-04, 04:34
Under replicated blocks are also consistent from a consumers point. Care of
explain the relation to weak consistency to hadoop.
On Wed, Sep 4, 2013 at 9:56 AM, Rahul Bhattacharjee <[EMAIL PROTECTED]
> Adam's response makes more sense to me to offline replicate generated data
> from one cluster to another across data centers.
> Not sure if configurable block placement block placement policy is
> supported in Hadoop.If yes , then alone side with rack awareness , you
> should be able to achieve the same.
> I could not follow your question related to weak consistency.
> On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
> [EMAIL PROTECTED]> wrote:
>> Are you talking about rack-awareness script?
>> I did go through rack awareness. Here are the problems with rack
>> awareness w.r.to my (given) "business requirment"
>> 1. Hadoop , default places two copies on the same rack and 1 copy on
>> some other rack. This would work as long as we have two data centers. if
>> business wants to have three data centers, then data would not be spread
>> across. Separately there is a question around whether it is the right thing
>> to do or not. I have been promised by business that they would buy enough
>> bandwidth such that each data center will be few milliseconds apart (in
>> 2. I believe Hadoop automatically re-replicates data if one or more node
>> is down. Assume when one out of 2 data center goes down. There will be a
>> massive data flow to create additional copies. When I say data center
>> support, I should be able to configure hadoop to say
>> a) Maintain 1 copy per data center
>> b) If any data center goes down, dont create additional copies.
>> Above requirements that I am pointing will essentially move hadoop from
>> strongly consistent to a week/eventual consistent model. Since this changes
>> fundamental architecture, it will probably break all sort of things...
>> Might not be possible ever in Hadoop.
>> Is there a way to implement above requirement via Federation?
>> Date: Sun, 1 Sep 2013 00:20:04 +0530
>> Subject: Re: Multidata center support
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]
>> What do you think friends I think hadoop clusters can run on multiple
>> data centers using FEDERATION
>> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <[EMAIL PROTECTED]
>> > wrote:
>> The only problem i guess hadoop wont be able to duplicate data from one
>> data center to another but i guess i can identify data nodes or namenodes
>> from another data center correct me if i am wrong
>> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <[EMAIL PROTECTED]
>> > wrote:
>> lets say that
>> you have some machines in europe and some in US I think you just need
>> the ips and configure them in your cluster set up
>> it will work...
>> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <[EMAIL PROTECTED]> wrote:
>> Although you can set datacenter layer on your network topology, it is
>> never enabled in hadoop as lacking of replica placement and task scheduling
>> support. There are some work to add layers other than rack and node under
>> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
>> spanning multiple data centers seems not make sense even for DR case. Do
>> you have other cases to do such a deployment?
>> *From: *"Adam Muise" <[EMAIL PROTECTED]>
>> *To: *[EMAIL PROTECTED]
>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>> *Subject: *Re: Multidata center support
>> Nothing has changed. DR best practice is still one (or more) clusters per
>> site and replication is handled via distributed copy or some variation of
>> it. A cluster spanning multiple data centers is a poor idea right now.