Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # general >> Dedicated disk for operating system


+
Oded Rosen 2011-08-10, 09:22
Copy link to this message
-
Re: Dedicated disk for operating system

On Aug 10, 2011, at 2:22 AM, Oded Rosen wrote:

> Hi,
> What is the best practice regarding disk allocation on hadoop data nodes?
> We plan on having multiple storage disks per node, and we want to know if we should save a smaller, separate disk for the os (centos).
> Is it the suggested configuration, or is it ok to let the OS reside on one of the HDFS storage disks?
It's a waste to put the OS disk on a separate disk.  Every spindle = performance, esp for MR spills.

I'm currently configuring:

disk 1 - os, swap, app area, MR spill space, HDFS space
disk 2 through n - swap, MR spill space, HDFS space

The usual reason people say to put the OS on a separate space is to make upgrades easier as you won't have to touch the application.  The reality is that you're going to blow away the entire machine during an upgrade anyway.  So don't worry about this situation.

I know a lot of people combine the MR spill space and HDFS space onto the same partition, but I've found that keeping them separate has  two advantages:

* No longer have to deal with the stupid math that HDFS uses for reservation--no question as to how much space one actually has
* A hard limit on MR space kills badly written jobs before they eat up enough space to nuke HDFS

Of course, the big disadvantage is one needs to calculate the correct space needed, and that's a toughie.  But if you know your applications then not a problem.  Besides, if one gets it wrong, you can always do a rolling re-install to fix it.

Also note that in this configuration that one cannot take advantage of the "keep the machine up at all costs" features in newer Hadoop's, which require that root, swap, and the log area be mirrored to be truly effective.  I'm not quite convinced that those features are worth it yet for anything smaller than maybe a 12 disk config.
+
Oded Rosen 2011-08-10, 14:25
+
Evert Lammerts 2011-08-10, 14:56
+
Scott Carey 2011-08-10, 17:24
+
Ted Dunning 2011-08-10, 17:40
+
Luke Lu 2011-08-10, 19:19
+
Brian Bockelman 2011-08-10, 19:31
+
Ted Dunning 2011-08-10, 19:44
+
Steve Loughran 2011-08-13, 19:23
+
Ted Dunning 2011-08-10, 19:49
+
Rajiv Chittajallu 2011-08-11, 00:15
+
Ted Dunning 2011-08-11, 06:13
+
Steve Loughran 2011-08-13, 19:30
+
Allen Wittenauer 2011-08-10, 19:04
+
Scott Carey 2011-08-10, 17:40
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB