Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Mutiple dfs.data.dir vs RAID0


Copy link to this message
-
Re: Mutiple dfs.data.dir vs RAID0
Michael Katzenellenbogen 2013-02-11, 02:12
One thought comes to mind: disk failure. In the event a disk goes bad,
then with RAID0, you just lost your entire array. With JBOD, you lost
one disk.

-Michael

On Feb 10, 2013, at 8:58 PM, Jean-Marc Spaggiari
<[EMAIL PROTECTED]> wrote:

> Hi,
>
> I have a quick question regarding RAID0 performances vs multiple
> dfs.data.dir entries.
>
> Let's say I have 2 x 2TB drives.
>
> I can configure them as 2 separate drives mounted on 2 folders and
> assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives
> with RAID0 and assigned them as a single folder to dfs.data.dir.
>
> With RAID0, the reads and writes are going to be spread over the 2
> disks. This is significantly increasing the speed. But if I put 2
> entries in dfs.data.dir, hadoop is going to spread over those 2
> directories too, and at the end, ths results should the same, no?
>
> Any experience/advice/results to share?
>
> Thanks,
>
> JM