Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Multiple data centre in Hadoop


Copy link to this message
-
Re: Multiple data centre in Hadoop
Robert Evans 2012-04-19, 21:28
Where I work  we have done some things like this, but none of them are open source, and I have not really been directly involved with the details of it.  I can guess about what it would take, but that is all it would be at this point.

--Bobby
On 4/17/12 5:46 PM, "Abhishek Pratap Singh" <[EMAIL PROTECTED]> wrote:

Thanks bobby, I m looking for something like this..... Now the question is
what is the best strategy to do Hot/Hot or Hot/Warm.
I need to consider the CPU and Network bandwidth, also needs to decide from
which layer this replication should start.

Regards,
Abhishek

On Mon, Apr 16, 2012 at 7:08 AM, Robert Evans <[EMAIL PROTECTED]> wrote:

> Hi Abhishek,
>
> Manu is correct about High Availability within a single colo.  I realize
> that in some cases you have to have fail over between colos.  I am not
> aware of any turn key solution for things like that, but generally what you
> want to do is to run two clusters, one in each colo, either hot/hot or
> hot/warm, and I have seen both depending on how quickly you need to fail
> over.  In hot/hot the input data is replicated to both clusters and the
> same software is run on both.  In this case though you have to be fairly
> sure that your processing is deterministic, or the results could be
> slightly different (i.e. No generating if random ids).  In hot/warm the
> data is replicated from one colo to the other at defined checkpoints.  The
> data is only processed on one of the grids, but if that colo goes down the
> other one can take up the processing from where ever the last checkpoint
> was.
>
> I hope that helps.
>
> --Bobby
>
> On 4/12/12 5:07 AM, "Manu S" <[EMAIL PROTECTED]> wrote:
>
> Hi Abhishek,
>
> 1. Use multiple directories for *dfs.name.dir* & *dfs.data.dir* etc
> * Recommendation: write to *two local directories on different
> physical volumes*, and to an *NFS-mounted* directory
> - Data will be preserved even in the event of a total failure of the
> NameNode machines
> * Recommendation: *soft-mount the NFS* directory
> - If the NFS mount goes offline, this will not cause the NameNode
> to fail
>
> 2. *Rack awareness*
>
> https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf
>
> On Thu, Apr 12, 2012 at 2:18 AM, Abhishek Pratap Singh
> <[EMAIL PROTECTED]>wrote:
>
> > Thanks Robert.
> > Is there a best practice or design than can address the High Availability
> > to certain extent?
> >
> > ~Abhishek
> >
> > On Wed, Apr 11, 2012 at 12:32 PM, Robert Evans <[EMAIL PROTECTED]>
> > wrote:
> >
> > > No it does not. Sorry
> > >
> > >
> > > On 4/11/12 1:44 PM, "Abhishek Pratap Singh" <[EMAIL PROTECTED]>
> wrote:
> > >
> > > Hi All,
> > >
> > > Just wanted if hadoop supports more than one data centre. This is
> > basically
> > > for DR purposes and High Availability where one centre goes down other
> > can
> > > bring up.
> > >
> > >
> > > Regards,
> > > Abhishek
> > >
> > >
> >
>
>
>
> --
> Thanks & Regards
> ----
> *Manu S*
> SI Engineer - OpenSource & HPC
> Wipro Infotech
> Mob: +91 8861302855                Skype: manuspkd
> www.opensourcetalk.co.in
>
>