Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - Hadoop Java Versions


Copy link to this message
-
Re: Hadoop Java Versions
Ted Dunning 2011-07-01, 00:16
You have to consider the long-term reliability as well.

Losing an entire set of 10 or 12 disks at once makes the overall reliability
of a large cluster very suspect.  This is because it becomes entirely too
likely that two additional drives will fail before the data on the off-line
node can be replicated.  For 100 nodes, that can decrease the average time
to data loss down to less than a year.  This can only be mitigated in stock
hadoop by keeping the number of drives relatively low.  MapR avoids this by
not failing nodes for trivial problems.

On Thu, Jun 30, 2011 at 4:18 PM, Aaron Eng <[EMAIL PROTECTED]> wrote:

> >Keeping the amount of disks per node low and the amount of nodes high
> should keep the impact of dead nodes in control.
>
> It keeps the impact of dead nodes in control but I don't think thats
> long-term cost efficient.  As prices of 10GbE go down, the "keep the node
> small" arguement seems less fitting.  And on another note, most servers
> manufactured in the last 10 years have dual 1GbE network interfaces.  If
> one
> were to go by these calcs:
>
> >150 nodes with four 2TB disks each, with HDFS 60% full, it takes around
> ~32
> minutes to recover
>
> It seems like that assumes a single 1GbE interface, why  not leverage the
> second?
>
> On Thu, Jun 30, 2011 at 2:31 PM, Evert Lammerts <[EMAIL PROTECTED]
> >wrote:
>
> > > You can get 12-24 TB in a server today, which means the loss of a
> server
> > > generates a lot of traffic -which argues for 10 Gbe.
> > >
> > > But
> > >   -big increase in switch cost, especially if you (CoI warning) go with
> > > Cisco
> > >   -there have been problems with things like BIOS PXE and lights out
> > > management on 10 Gbe -probably due to the NICs being things the BIOS
> > > wasn't expecting and off the mainboard. This should improve.
> > >   -I don't know how well linux works with ether that fast (field
> reports
> > > useful)
> > >   -the big threat is still ToR switch failure, as that will trigger a
> > > re-replication of every block in the rack.
> >
> > Keeping the amount of disks per node low and the amount of nodes high
> > should keep the impact of dead nodes in control. A ToR switch failing is
> > different - missing 30 nodes (~120TB) at once cannot be fixed by adding
> more
> > nodes; that actually increases ToR switch failure. Although such failure
> is
> > quite rare to begin with, I guess. The back-of-the-envelope-calculation I
> > made suggests that ~150 (1U) nodes should be fine with 1Gb ethernet.
> (e.g.,
> > when 6 nodes fail in a cluster with 150 nodes with four 2TB disks each,
> with
> > HDFS 60% full, it takes around ~32 minutes to recover. 2 nodes failing
> > should take around 640 seconds. Also see the attached spreadsheet.) This
> > doesn't take ToR switch failure in account though. On the other hand -
> 150
> > nodes is only ~5 racks - in such a scenario you might rather want to shut
> > the system down completely rather than letting it replicate 20% of all
> data.
> >
> > Cheers,
> > Evert
>