Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Disks RAID best practice


Copy link to this message
-
Re: Disks RAID best practice
Oleg, that's for an overall raid preference.

Specifically for the 'control nodes' aka (NN, SN, JT, HM, ZK...)

I tend to just use simple mirroring because these processes are not really I/O bound. (RAID-1).
I guess you could go RAID-10 (Stripe and Mirrored) but that may be a little overkill and my preference comes from working in the RDBMS world.

If we are using commodity servers, JBOD tends to be the preferred way of handling things.

However, I've seen cases where people will use RAIDed Drives on a node for a couple of reasons. The nice thing about doing mirrored DN drives is that if you have a disk failure you just pop the drive and replace it.  Much simpler.

If we're looking at using NetApp's E Series in conjunction with a compute cluster, then you are using their raided configuration and can reduce the cluster's replication factor to 2 from 3.

While its easy to recommend RAID on the control nodes, data nodes is a bit trickier.  I mean you can run with straight JBOD and based on a cost issue, its the cheapest in terms of hardware.  If you go with RAID on the DN, you reduce your storage density per node because you have redundancy in hardware. And this has an impact on your overall machine density and TCO.   This is offset by easier and faster recovery time from some hardware failure events. Lets face it, the number one thing to fail is going to be your hard drives.  So we are going to have to balance the costs against the benefits.

Now I have to state the obvious caveats... 1) YMMV, 2) The factors which go in to the cluster design decision are going to be unique to the company  setting up the cluster.  

These are IMHO, and you know what they say about opinions... ;-)

HTH
-Mike

On Nov 1, 2012, at 7:52 AM, Oleg Ruchovets <[EMAIL PROTECTED]> wrote:

> Do you mean RAID 10 for Master Node?
> What about DataNode?
>
> Thanks
> Oleg.
>
>
>
> On Thu, Nov 1, 2012 at 2:43 PM, Michael Segel <[EMAIL PROTECTED]>wrote:
>
>> I prefer RAID 10, but some say RAID 6.
>>
>> I thought NetApp used RAID 6 ?
>>
>> Its definitely an interesting discussion point though.
>>
>> -Mike
>>
>> On Nov 1, 2012, at 7:37 AM, Oleg Ruchovets <[EMAIL PROTECTED]> wrote:
>>
>>> Hi ,
>>>  What is the best practice for DISKS RAID  (Master and Data Nodes).
>>> Thanks in advance
>>> Oleg.
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB