Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Multidata center support


+
Baskar Duraikannu 2013-08-29, 19:12
+
Rahul Bhattacharjee 2013-08-30, 04:35
+
Adam Muise 2013-08-30, 10:26
+
Michael Segel 2013-09-05, 01:15
+
Rahul Bhattacharjee 2013-09-04, 04:26
Copy link to this message
-
Re: Multidata center support
Under replicated blocks are also consistent from a consumers point. Care of
explain the relation to weak consistency to hadoop.

Thanks,
Rahul
On Wed, Sep 4, 2013 at 9:56 AM, Rahul Bhattacharjee <[EMAIL PROTECTED]
> wrote:

> Adam's response makes more sense to me to offline replicate generated data
> from one cluster to another across data centers.
>
> Not sure if configurable block placement block placement policy is
> supported in Hadoop.If yes , then alone side with rack awareness , you
> should be able to achieve the same.
>
> I could not follow your question related to weak consistency.
>
> Thanks,
> Rahul
>
>
>
> On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu <
> [EMAIL PROTECTED]> wrote:
>
>> Rahul
>> Are you talking about rack-awareness script?
>>
>> I did go through rack awareness. Here are the problems with rack
>> awareness w.r.to my (given) "business requirment"
>>
>> 1.  Hadoop , default places two copies on the same rack and 1 copy on
>> some other rack.  This would work as long as we have two data centers. if
>> business wants to have three data centers, then data would not be spread
>> across. Separately there is a question around whether it is the right thing
>> to do or not. I have been promised by business that they would buy enough
>> bandwidth such that each data center will be few milliseconds apart (in
>> latency).
>>
>> 2. I believe Hadoop automatically re-replicates data if one or more node
>> is down. Assume when one out of 2 data center goes down. There will be a
>> massive data flow to create additional copies.  When I say data center
>> support, I should be able to configure hadoop to say
>>      a) Maintain 1 copy per data center
>>      b) If any data center goes down, dont create additional copies.
>>
>> Above requirements that I am pointing will essentially move hadoop from
>> strongly consistent to a week/eventual consistent model. Since this changes
>> fundamental architecture, it will probably break all sort of things...
>> Might not be possible ever in Hadoop.
>>
>> Thoughts?
>>
>> Sadak
>> Is there a way to implement above requirement via Federation?
>>
>> Thanks
>> Baskar
>>
>>
>> ------------------------------
>> Date: Sun, 1 Sep 2013 00:20:04 +0530
>>
>> Subject: Re: Multidata center support
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]
>>
>>
>> What do you think friends I think hadoop clusters can run on multiple
>> data centers using FEDERATION
>>
>>
>> On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <[EMAIL PROTECTED]
>> > wrote:
>>
>> The only problem i guess hadoop wont be able to duplicate data from one
>> data center to another but i guess i can identify data nodes or namenodes
>> from another data center correct me if i am wrong
>>
>>
>> On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <[EMAIL PROTECTED]
>> > wrote:
>>
>> lets say that
>>
>> you have some machines in europe and some  in US I think you just need
>> the ips and configure them in your cluster set up
>> it will work...
>>
>>
>> On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>     Although you can set datacenter layer on your network topology, it is
>> never enabled in hadoop as lacking of replica placement and task scheduling
>> support. There are some work to add layers other than rack and node under
>> HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster
>> spanning multiple data centers seems not make sense even for DR case. Do
>> you have other cases to do such a deployment?
>>
>> Thanks,
>>
>> Junping
>>
>> ------------------------------
>> *From: *"Adam Muise" <[EMAIL PROTECTED]>
>> *To: *[EMAIL PROTECTED]
>> *Sent: *Friday, August 30, 2013 6:26:54 PM
>> *Subject: *Re: Multidata center support
>>
>>
>> Nothing has changed. DR best practice is still one (or more) clusters per
>> site and replication is handled via distributed copy or some variation of
>> it. A cluster spanning multiple data centers is a poor idea right now.
+
Visioner Sadak 2013-09-05, 08:03
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB