Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Mutiple dfs.data.dir vs RAID0


+
Jean-Marc Spaggiari 2013-02-11, 01:57
Copy link to this message
-
Re: Mutiple dfs.data.dir vs RAID0
One thought comes to mind: disk failure. In the event a disk goes bad,
then with RAID0, you just lost your entire array. With JBOD, you lost
one disk.

-Michael

On Feb 10, 2013, at 8:58 PM, Jean-Marc Spaggiari
<[EMAIL PROTECTED]> wrote:

> Hi,
>
> I have a quick question regarding RAID0 performances vs multiple
> dfs.data.dir entries.
>
> Let's say I have 2 x 2TB drives.
>
> I can configure them as 2 separate drives mounted on 2 folders and
> assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives
> with RAID0 and assigned them as a single folder to dfs.data.dir.
>
> With RAID0, the reads and writes are going to be spread over the 2
> disks. This is significantly increasing the speed. But if I put 2
> entries in dfs.data.dir, hadoop is going to spread over those 2
> directories too, and at the end, ths results should the same, no?
>
> Any experience/advice/results to share?
>
> Thanks,
>
> JM
+
Jean-Marc Spaggiari 2013-02-11, 02:19
+
Jean-Marc Spaggiari 2013-02-11, 15:54
+
Michael Katzenellenbogen 2013-02-11, 16:02
+
Marcos Ortiz 2013-02-11, 02:39
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB