Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Mutiple dfs.data.dir vs RAID0


Copy link to this message
-
Mutiple dfs.data.dir vs RAID0
Hi,

I have a quick question regarding RAID0 performances vs multiple
dfs.data.dir entries.

Let's say I have 2 x 2TB drives.

I can configure them as 2 separate drives mounted on 2 folders and
assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives
with RAID0 and assigned them as a single folder to dfs.data.dir.

With RAID0, the reads and writes are going to be spread over the 2
disks. This is significantly increasing the speed. But if I put 2
entries in dfs.data.dir, hadoop is going to spread over those 2
directories too, and at the end, ths results should the same, no?

Any experience/advice/results to share?

Thanks,

JM