Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> HDFS drive, partition best practice

John Buchanan 2011-02-07, 20:25
Jonathan Disher 2011-02-07, 22:06
Scott Golby 2011-02-07, 22:40
John Buchanan 2011-02-08, 15:20
Copy link to this message
Re: HDFS drive, partition best practice

On Feb 8, 2011, at 7:20 AM, John Buchanan wrote:
> What we were thinking for our first deployment was 10 HP DL385's each with
> 8 2TB SATA drives.  First pair in Raid1 for the system drive, the
> remaining each containing a distinct partition and mount point, then
> specified in hdfs-site.xml in comma-delimited fashion.  Seems to make more
> sense to use Raid at least for the system drives so the loss of 1 drive
> won't take down the entire node.  Granted data integrity wouldn't be
> affected but how much time do you want to spend rebuilding an entire node
> due to the loss of one drive.  Considered using a smaller pair for the
> system drives but if they're all the same then we only need to stock one
> type of spare drive.
Don't bother RAID'ing the system drive.  Seriously.  You're giving up performance for something that rarely happens.  If you have decent configuration management, rebuilding a node is not a big deal and doesn't take that long anyway.  

Besides, losing one of the JBOD disks will likely bring the node down anyway.

> Another question I have is whether using 1TB drives would be advisable
> over 2TB for the purpose of reducing rebuild time.  

You're over thinking the rebuild time.  Again, configuration management makes this a non-issue.
> Or perhaps I'm still
> thinking of this as I would a Raid volume.  If we needed to rebalance
> across the cluster would the time needed be more dependent on the amount
> of data involved and the connectivity between nodes?


When a node goes down, the data and tasks are automatically moved.  So a node can be down for as long as it needs to be down.  The grid will still be functional.  So don't panic if a compute node goes down. :)
Adam Phelps 2011-02-08, 19:33
Allen Wittenauer 2011-02-08, 20:09
Patrick Angeles 2011-02-08, 20:17
Patrick Angeles 2011-02-08, 20:22
Allen Wittenauer 2011-02-08, 20:43
Mag Gam 2011-02-22, 12:34
Patrick Angeles 2011-02-08, 19:53
Bharath Mundlapudi 2011-02-08, 19:10