-Re: Multiple data centre in Hadoop
Robert Evans 2012-04-19, 21:33
If you want to start an open source project for this I am sure that there are others with the same problem that might be very wiling to help out. :)
On 4/19/12 4:31 PM, "Michael Segel" <[EMAIL PROTECTED]> wrote:
I don't know of any open source solution in doing this...
And yeah its something one can't talk about.... ;-)
On Apr 19, 2012, at 4:28 PM, Robert Evans wrote:
> Where I work we have done some things like this, but none of them are open source, and I have not really been directly involved with the details of it. I can guess about what it would take, but that is all it would be at this point.
> On 4/17/12 5:46 PM, "Abhishek Pratap Singh" <[EMAIL PROTECTED]> wrote:
> Thanks bobby, I m looking for something like this..... Now the question is
> what is the best strategy to do Hot/Hot or Hot/Warm.
> I need to consider the CPU and Network bandwidth, also needs to decide from
> which layer this replication should start.
> On Mon, Apr 16, 2012 at 7:08 AM, Robert Evans <[EMAIL PROTECTED]> wrote:
>> Hi Abhishek,
>> Manu is correct about High Availability within a single colo. I realize
>> that in some cases you have to have fail over between colos. I am not
>> aware of any turn key solution for things like that, but generally what you
>> want to do is to run two clusters, one in each colo, either hot/hot or
>> hot/warm, and I have seen both depending on how quickly you need to fail
>> over. In hot/hot the input data is replicated to both clusters and the
>> same software is run on both. In this case though you have to be fairly
>> sure that your processing is deterministic, or the results could be
>> slightly different (i.e. No generating if random ids). In hot/warm the
>> data is replicated from one colo to the other at defined checkpoints. The
>> data is only processed on one of the grids, but if that colo goes down the
>> other one can take up the processing from where ever the last checkpoint
>> I hope that helps.
>> On 4/12/12 5:07 AM, "Manu S" <[EMAIL PROTECTED]> wrote:
>> Hi Abhishek,
>> 1. Use multiple directories for *dfs.name.dir* & *dfs.data.dir* etc
>> * Recommendation: write to *two local directories on different
>> physical volumes*, and to an *NFS-mounted* directory
>> - Data will be preserved even in the event of a total failure of the
>> NameNode machines
>> * Recommendation: *soft-mount the NFS* directory
>> - If the NFS mount goes offline, this will not cause the NameNode
>> to fail
>> 2. *Rack awareness*
>> On Thu, Apr 12, 2012 at 2:18 AM, Abhishek Pratap Singh
>> <[EMAIL PROTECTED]>wrote:
>>> Thanks Robert.
>>> Is there a best practice or design than can address the High Availability
>>> to certain extent?
>>> On Wed, Apr 11, 2012 at 12:32 PM, Robert Evans <[EMAIL PROTECTED]>
>>>> No it does not. Sorry
>>>> On 4/11/12 1:44 PM, "Abhishek Pratap Singh" <[EMAIL PROTECTED]>
>>>> Hi All,
>>>> Just wanted if hadoop supports more than one data centre. This is
>>>> for DR purposes and High Availability where one centre goes down other
>>>> bring up.
>> Thanks & Regards
>> *Manu S*
>> SI Engineer - OpenSource & HPC
>> Wipro Infotech
>> Mob: +91 8861302855 Skype: manuspkd