Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Mutiple dfs.data.dir vs RAID0

Copy link to this message
Mutiple dfs.data.dir vs RAID0

I have a quick question regarding RAID0 performances vs multiple
dfs.data.dir entries.

Let's say I have 2 x 2TB drives.

I can configure them as 2 separate drives mounted on 2 folders and
assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives
with RAID0 and assigned them as a single folder to dfs.data.dir.

With RAID0, the reads and writes are going to be spread over the 2
disks. This is significantly increasing the speed. But if I put 2
entries in dfs.data.dir, hadoop is going to spread over those 2
directories too, and at the end, ths results should the same, no?

Any experience/advice/results to share?


Michael Katzenellenbogen 2013-02-11, 02:12
Jean-Marc Spaggiari 2013-02-11, 02:19
Jean-Marc Spaggiari 2013-02-11, 15:54
Michael Katzenellenbogen 2013-02-11, 16:02
Marcos Ortiz 2013-02-11, 02:39