Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Sizing help


+
Steve Ed 2011-10-21, 21:40
+
Rita 2011-11-07, 10:58
+
Ted Dunning 2011-11-07, 22:06
+
Rita 2011-11-08, 00:34
+
Ted Dunning 2011-11-08, 00:53
+
Rita 2011-11-08, 12:32
+
Ted Dunning 2011-11-08, 12:38
+
Matt Foley 2011-11-11, 09:57
+
Koji Noguchi 2011-11-11, 17:26
+
Ted Dunning 2011-11-11, 17:49
+
Steve Ed 2011-11-11, 17:59
+
Matt Foley 2011-11-11, 18:15
+
Todd Lipcon 2011-11-11, 18:37
Copy link to this message
-
Re: Sizing help
Todd is correct.  The capability to recognize repaired disks and
re-incorporate them is not available in the current implementation of  disk
fail-in-place.  So the datanode service does need to be restarted, at which
point it will re-join the cluster automatically, with all its working disks.

On Fri, Nov 11, 2011 at 10:37 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote:

> On Fri, Nov 11, 2011 at 10:15 AM, Matt Foley <[EMAIL PROTECTED]>
> wrote:
> > Nope; hot swap :-)
>
> AFAIK you can't re-add the marked-dead disk to the DN, can you?
>
> But yea, you can hot-swap the disk, then kick the DN process, which
> should take less than 10 minutes. That means the NN won't ever notice
> it's down, and you won't incur any replication costs.
>
> -Todd
>
> >
> > On Nov 11, 2011, at 9:59 AM, Steve Ed <[EMAIL PROTECTED]> wrote:
> >
> > I understand that with 0.20.204, loss of a disk doesn’t  loss the node.
> But
> > if we have to replace that lost disk, its again scheduling the whole node
> > down, kicking replication
> >
> >
> >
> > From: Matt Foley [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, November 11, 2011 1:58 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Sizing help
> >
> >
> >
> > I agree with Ted's argument that 3x replication is way better than 2x.
>  But
> > I do have to point out that, since 0.20.204, the loss of a disk no longer
> > causes the loss of a whole node (thankfully!) unless it's the system
> disk.
> >  So in the example given, if you estimate a disk failure every 2 hours,
> each
> > node only has to re-replicate about 2GB of data, not 12GB.  So about
> 1-in-72
> > such failures risks data loss, rather than 1-in-12.  Which is still
> > unacceptable, so use 3x replication! :-)
> >
> > --Matt
> >
> > On Mon, Nov 7, 2011 at 4:53 PM, Ted Dunning <[EMAIL PROTECTED]>
> wrote:
> >
> > 3x replication has two effects.  One is reliability.  This is probably
> more
> > important in large clusters than small.
> >
> >
> >
> > Another important effect is data locality during map-reduce.  Having 3x
> > replication allows mappers to have almost all invocations read from local
> > disk.  2x replication compromises this.  Even where you don't have local
> > data, the bandwidth available to read from 3x replicated data is 1.5x the
> > bandwidth available for 2x replication.
> >
> >
> >
> > To get a rough feel for how reliable you should consider a cluster, you
> can
> > do a pretty simple computation.  If you have 12 x 2T on a single machine
> and
> > you lose that machine, the remaining copies of that data must be
> replicated
> > before another disk fails.  With HDFS and block-level replication, the
> > remaining copies will be spread across the entire cluster to any disk
> > failure is reasonably like to cause data loss.  For a 1000 node cluster
> with
> > 12000 disks, it is conservative to estimate a disk failure on average
> every
> > 2 hours.  Each node will have replicate about 12GB of data which will
> take
> > about 500 seconds or about 9 or 10 minutes if you only use 25% of your
> > network for re-replication.  The probability of a disk failure  during a
> 10
> > minute period is 1-exp(-10/120) = 8%.  This means that roughly 1 in 12
> full
> > machine failures might cause data loss.   You can pick whatever you like
> for
> > the rate at which nodes die, but I don't think that this is acceptable.
> >
> >
> >
> > My numbers for disk failures are purposely somewhat pessimistic.  If you
> > change the MTBF for disks to 10 years instead of 3 years, then the
> > probability of data loss after a machine failure drops, but only to about
> > 2.5%.
> >
> >
> >
> > Now, I would be the first to say that these numbers feel too high, but I
> > also would rather not experience enough data loss events to have a
> reliable
> > gut feel for how often they should occur.
> >
> >
> >
> > My feeling is that 2x is fine for data you can reconstruct and which you
> > don't need to read really fast, but not good enough for data whose loss
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB