Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> HDFS drive, partition best practice


Copy link to this message
-
Re: HDFS drive, partition best practice
On Mon, Feb 7, 2011 at 2:06 PM, Jonathan Disher <[EMAIL PROTECTED]> wrote:

> I've never seen an implementation of concat volumes that tolerate a disk
> failure, just like RAID0.
>
> Currently I have a 48 node cluster using Dell R710's with 12 disks - two
> 250GB SATA drives in RAID1 for OS, and ten 1TB SATA disks as a JBOD (mounted
> on /data/0 through /data/9) and listed separately in hdfs-site.xml.  It
> works... mostly.  The big issues you will encounter is losing a disk - the
> DataNode process will crash, and if you comment out the affected drive, when
> you replace it you will have 9 disks full to N% and one empty disk.  The DFS
> balancer cannot fix this - usually when I have data nodes down more than an
> hour, I format all drives in the box and rebalance.
>
You can manually move blocks around between volumes in a single node (while
the DN is not running). It would be great someday if the DN managed this
automatically.
>
> We are building a new cluster aimed primarily at storage - we will be using
> SuperMicro 4U machines with 36 2TB SATA disks in three RAID6 volumes (for
> roughly 20TB usable per volume, 60 total), plus two SSD's for OS.  At this
> scale, JBOD is going to kill you (rebalancing 40-50TB, even when I bond 8
> gigE interfaces together, takes too long).
>

Interesting. It's not JBOD that kills you there, it's the fact that you have
72TB of data on a single box. You mitigate the failure risks by going RAID6,
but you're still going to have block movement issues by going with high
storage Datanodes if/when they fail for any reason.
>
> -j
>
> On Feb 7, 2011, at 12:25 PM, John Buchanan wrote:
>
> Hello,
>
> My company will be building a small but quickly growing Hadoop deployment,
> and I had a question regarding best practice for configuring the storage for
> the datanodes.  Cloudera has a page where they recommend a JBOD
> configuration over RAID.  My question, though, is whether they are referring
> to the simplest definition of JBOD, that being literally just a collection
> of heterogeneous drives, each with its own distinct partition and mount
> point?  Or are they referring to a concatenated span of heterogeneous drives
> presented to the OS as a single device?
>
> Through some digging I've discovered that data volumes may be specified in
> a comma-delimited fashion in the hdfs-site.xml file and are then accessed
> individually, but are of course all available within the pool.  To test this
> I brought a Ubuntu Server 10.04 VM online (on Xen Cloud Platform) with 3
> storage volumes.  The first is the OS, I created a single partition the
> second and third, mounting them as /hadoop-datastore/a and
> /hadoop-datastore/b respectively, specifying them in hdfs-site.xml in
> comma-delimited fashion.  I then continued to construct a single node
> pseudo-distributed install, executed the bin/start-all.sh script, and all
> seems just great.  The volumes are 5GB each, and HDFS status page shows a
> configured capacity of 9.84GB, so both are in use, I successfully added a
> file using bin/hadoop dfs –put.
>
> This lead me to think that perhaps an optimal datanode configuration would
> be 2 drives in Raid1 for OS, then 2-4 additional drives for data,
> individually partitioned, mounted, and configured in hdfs-site.xml.
>  Mirrored system drives would make my node more robust but data drives would
> still be independent.  I do realize that HDFS assures data redundancy at a
> higher level by design, but if the loss of a single drive necessitated
> rebuilding an entire node, and therefore being down in capacity during that
> period, just doesn't seem to be the most efficient approach.
>
> Would love to hear what others are doing in this regard, whether anyone is
> using concatenated disks and whether the loss of a drive requires them to
> rebuild the entire system.
>
> John Buchanan
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB