Steve Ed 2011-10-21, 21:40
Rita 2011-11-07, 10:58
Ted Dunning 2011-11-07, 22:06
Rita 2011-11-08, 00:34
Ted Dunning 2011-11-08, 00:53
Rita 2011-11-08, 12:32
Ted Dunning 2011-11-08, 12:38
I agree with Ted's argument that 3x replication is way better than 2x. But
I do have to point out that, since 0.20.204, the loss of a disk no longer
causes the loss of a whole node (thankfully!) unless it's the system disk.
So in the example given, if you estimate a disk failure every 2 hours,
each node only has to re-replicate about 2GB of data, not 12GB. So about
1-in-72 such failures risks data loss, rather than 1-in-12. Which is still
unacceptable, so use 3x replication! :-)
On Mon, Nov 7, 2011 at 4:53 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> 3x replication has two effects. One is reliability. This is probably
> more important in large clusters than small.
> Another important effect is data locality during map-reduce. Having 3x
> replication allows mappers to have almost all invocations read from local
> disk. 2x replication compromises this. Even where you don't have local
> data, the bandwidth available to read from 3x replicated data is 1.5x the
> bandwidth available for 2x replication.
> To get a rough feel for how reliable you should consider a cluster, you
> can do a pretty simple computation. If you have 12 x 2T on a single
> machine and you lose that machine, the remaining copies of that data must
> be replicated before another disk fails. With HDFS and block-level
> replication, the remaining copies will be spread across the entire cluster
> to any disk failure is reasonably like to cause data loss. For a 1000 node
> cluster with 12000 disks, it is conservative to estimate a disk failure on
> average every 2 hours. Each node will have replicate about 12GB of data
> which will take about 500 seconds or about 9 or 10 minutes if you only use
> 25% of your network for re-replication. The probability of a disk failure
> during a 10 minute period is 1-exp(-10/120) = 8%. This means that roughly
> 1 in 12 full machine failures might cause data loss. You can pick
> whatever you like for the rate at which nodes die, but I don't think that
> this is acceptable.
> My numbers for disk failures are purposely somewhat pessimistic. If you
> change the MTBF for disks to 10 years instead of 3 years, then the
> probability of data loss after a machine failure drops, but only to about
> Now, I would be the first to say that these numbers feel too high, but I
> also would rather not experience enough data loss events to have a reliable
> gut feel for how often they should occur.
> My feeling is that 2x is fine for data you can reconstruct and which you
> don't need to read really fast, but not good enough for data whose loss
> will get you fired.
> On Mon, Nov 7, 2011 at 7:34 PM, Rita <[EMAIL PROTECTED]> wrote:
>> I have been running with 2x replication on a 500tb cluster. No issues
>> whatsoever. 3x is for super paranoid.
>> On Mon, Nov 7, 2011 at 5:06 PM, Ted Dunning <[EMAIL PROTECTED]>wrote:
>>> Depending on which distribution and what your data center power limits
>>> are you may save a lot of money by going with machines that have 12 x 2 or
>>> 3 tb drives. With suitable engineering margins and 3 x replication you can
>>> have 5 tb net data per node and 20 nodes per rack. If you want to go all
>>> cowboy with 2x replication and little space to spare then you can double
>>> that density.
>>> On Monday, November 7, 2011, Rita <[EMAIL PROTECTED]> wrote:
>>> > For a 1PB installation you would need close to 170 servers with 12 TB
>>> disk pack installed on them (with replication factor of 2). Thats a
>>> conservative estimate
>>> > CPUs: 4 cores with 16gb of memory
>>> > Namenode: 4 core with 32gb of memory should be ok.
>>> > On Fri, Oct 21, 2011 at 5:40 PM, Steve Ed <[EMAIL PROTECTED]> wrote:
>>> >> I am a newbie to Hadoop and trying to understand how to Size a Hadoop
>>> >> What are factors I should consider deciding the number of datanodes ?
>>> >> Datanode configuration ? CPU, Memory
Koji Noguchi 2011-11-11, 17:26
Ted Dunning 2011-11-11, 17:49
Steve Ed 2011-11-11, 17:59
Matt Foley 2011-11-11, 18:15
Todd Lipcon 2011-11-11, 18:37
Matt Foley 2011-11-15, 22:29