Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Dedicated disk for operating system


Copy link to this message
-
Re: Dedicated disk for operating system

On Aug 10, 2011, at 7:56 AM, Evert Lammerts wrote:

> A short, slightly off-topic question:
>
>>      Also note that in this configuration that one cannot take
>> advantage of the "keep the machine up at all costs" features in newer
>> Hadoop's, which require that root, swap, and the log area be mirrored
>> to be truly effective.  I'm not quite convinced that those features are
>> worth it yet for anything smaller than maybe a 12 disk config.
>
> Dell and Cloudera promote the C2100. I'd like to see the calculations behind that config.

If Dell is shipping the same box they shipped us to test a few months ago, the performance was pretty horrid vs. almost all their competitors.  The main problem was the controller--it was built for RAID, not for JBOD.  (... and then there is the OOB support...)
> Am I wrong thinking that keeping your cluster up with such dense nodes will only work if you have many (order of magnitude 100+) of them, and interconnected with 10Gb Ethernet? If you don't then recovery times from failing disks / rack switches are going to get crazy, right?

If one assumes that a bunch of nodes are failing at once, yes.  The irony is that ops teams tend to group repairs, so keeping them up might actually be the wrong thing in relation to actual practice.

> If you want to get bang for buck, don't the proportions "disk IO / processing power", "node storage capacity / ethernet speed" and "total amount of nodes / ethernet speed", indicate many small nodes with not too many disks and 1Gb Ethernet?

The biggest constraint is almost always RAM, as you can use it to help with the rest.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB