Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Multidata center support


Copy link to this message
-
Re: Multidata center support
Rahul Bhattacharjee 2013-09-04, 04:34
Under replicated blocks are also consistent from a consumers point. Care of
explain the relation to weak consistency to hadoop.

Thanks,
Rahul
On Wed, Sep 4, 2013 at 9:56 AM, Rahul Bhattacharjee <[EMAIL PROTECTED]
> wrote:

> Adam's response makes more sense to me to offline replicate generated data
> from one cluster to another across data centers.
>
> Not sure if configurable block placement block placement policy is
> supported in Hadoop.If yes , then alone side with rack awareness , you
> should be able to achieve the same.
>
> I could not follow your question related to weak consistency.
>
> Thanks,
> Rahul
>
>
>
> On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
> [EMAIL PROTECTED]> wrote:
>
>> Rahul
>> Are you talking about rack-awareness script?
>>
>> I did go through rack awareness. Here are the problems with rack
>> awareness w.r.to my (given) "business requirment"
>>
>> 1.  Hadoop , default places two copies on the same rack and 1 copy on
>> some other rack.  This would work as long as we have two data centers. if
>> business wants to have three data centers, then data would not be spread
>> across. Separately there is a question around whether it is the right thing
>> to do or not. I have been promised by business that they would buy enough
>> bandwidth such that each data center will be few milliseconds apart (in
>> latency).
>>
>> 2. I believe Hadoop automatically re-replicates data if one or more node
>> is down. Assume when one out of 2 data center goes down. There will be a
>> massive data flow to create additional copies.  When I say data center
>> support, I should be able to configure hadoop to say
>>      a) Maintain 1 copy per data center
>>      b) If any data center goes down, dont create additional copies.
>>
>> Above requirements that I am pointing will essentially move hadoop from
>> strongly consistent to a week/eventual consistent model. Since this changes
>> fundamental architecture, it will probably break all sort of things...
>> Might not be possible ever in Hadoop.
>>
>> Thoughts?
>>
>> Sadak
>> Is there a way to implement above requirement via Federation?
>>
>> Thanks
>> Baskar
>>
>>
>> ------------------------------
>> Date: Sun, 1 Sep 2013 00:20:04 +0530
>>
>> Subject: Re: Multidata center support
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]
>>
>>
>> What do you think friends I think hadoop clusters can run on multiple
>> data centers using FEDERATION
>>
>>
>> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <[EMAIL PROTECTED]
>> > wrote:
>>
>> The only problem i guess hadoop wont be able to duplicate data from one
>> data center to another but i guess i can identify data nodes or namenodes
>> from another data center correct me if i am wrong
>>
>>
>> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <[EMAIL PROTECTED]
>> > wrote:
>>
>> lets say that
>>
>> you have some machines in europe and some  in US I think you just need
>> the ips and configure them in your cluster set up
>> it will work...
>>
>>
>> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>     Although you can set datacenter layer on your network topology, it is
>> never enabled in hadoop as lacking of replica placement and task scheduling
>> support. There are some work to add layers other than rack and node under
>> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
>> spanning multiple data centers seems not make sense even for DR case. Do
>> you have other cases to do such a deployment?
>>
>> Thanks,
>>
>> Junping
>>
>> ------------------------------
>> *From: *"Adam Muise" <[EMAIL PROTECTED]>
>> *To: *[EMAIL PROTECTED]
>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>> *Subject: *Re: Multidata center support
>>
>>
>> Nothing has changed. DR best practice is still one (or more) clusters per
>> site and replication is handled via distributed copy or some variation of
>> it. A cluster spanning multiple data centers is a poor idea right now.