Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> HDFS drive, partition best practice


Copy link to this message
-
Re: HDFS drive, partition best practice

On Feb 8, 2011, at 7:20 AM, John Buchanan wrote:
> What we were thinking for our first deployment was 10 HP DL385's each with
> 8 2TB SATA drives.  First pair in Raid1 for the system drive, the
> remaining each containing a distinct partition and mount point, then
> specified in hdfs-site.xml in comma-delimited fashion.  Seems to make more
> sense to use Raid at least for the system drives so the loss of 1 drive
> won't take down the entire node.  Granted data integrity wouldn't be
> affected but how much time do you want to spend rebuilding an entire node
> due to the loss of one drive.  Considered using a smaller pair for the
> system drives but if they're all the same then we only need to stock one
> type of spare drive.
Don't bother RAID'ing the system drive.  Seriously.  You're giving up performance for something that rarely happens.  If you have decent configuration management, rebuilding a node is not a big deal and doesn't take that long anyway.  

Besides, losing one of the JBOD disks will likely bring the node down anyway.

> Another question I have is whether using 1TB drives would be advisable
> over 2TB for the purpose of reducing rebuild time.  

You're over thinking the rebuild time.  Again, configuration management makes this a non-issue.
> Or perhaps I'm still
> thinking of this as I would a Raid volume.  If we needed to rebalance
> across the cluster would the time needed be more dependent on the amount
> of data involved and the connectivity between nodes?

Yes.

When a node goes down, the data and tasks are automatically moved.  So a node can be down for as long as it needs to be down.  The grid will still be functional.  So don't panic if a compute node goes down. :)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB