Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Mutiple dfs.data.dir vs RAID0


Copy link to this message
-
Re: Mutiple dfs.data.dir vs RAID0
@Michael:
I have done some tests between RAID0, 1, JBOD and LVM on another server.

Results are there:
http://www.spaggiari.org/index.php/hbase/hard-drives-performances
LVM and JBOD were close, that's why I talked about LVM, since it seems
to be pretty close to JBOD performance wyse and can be done on any
hardware even if the MB is not proposing any RAID/JBOD option.

@Chris:
I will have to test and see. Like what if I had a drive now to an
existing DataNode? Is it going to spread it's existing data over the 2
drives? Or are they going to grow the same speed?

I will add one drive to one server tomorrow and see the results...
Then I will run some performances tests and see...

2013/2/10, Michael Katzenellenbogen <[EMAIL PROTECTED]>:
> Are you able to create multiple RAID0 volumes? Perhaps you can expose
> each disk as its own RAID0 volume...
>
> Not sure why or where LVM comes into the picture here ... LVM is on
> the software layer and (hopefully) the RAID/JBOD stuff is at the
> hardware layer (and in the case of HDFS, LVM will only add unneeded
> overhead).
>
> -Michael
>
> On Feb 10, 2013, at 9:19 PM, Jean-Marc Spaggiari
> <[EMAIL PROTECTED]> wrote:
>
>> The issue is that my MB is not doing JBOD :( I have RAID only
>> possible, and I'm fighting for the last 48h and still not able to make
>> it work... That's why I'm thinking about using dfs.data.dir instead.
>>
>> I have 1 drive per node so far and need to move to 2 to reduce WIO.
>>
>> What will be better with JBOD against dfs.data.dir? I have done some
>> tests JBOD vs LVM and did not find any pros for JBOD so far.
>>
>> JM
>>
>> 2013/2/10, Michael Katzenellenbogen <[EMAIL PROTECTED]>:
>>> One thought comes to mind: disk failure. In the event a disk goes bad,
>>> then with RAID0, you just lost your entire array. With JBOD, you lost
>>> one disk.
>>>
>>> -Michael
>>>
>>> On Feb 10, 2013, at 8:58 PM, Jean-Marc Spaggiari
>>> <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a quick question regarding RAID0 performances vs multiple
>>>> dfs.data.dir entries.
>>>>
>>>> Let's say I have 2 x 2TB drives.
>>>>
>>>> I can configure them as 2 separate drives mounted on 2 folders and
>>>> assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives
>>>> with RAID0 and assigned them as a single folder to dfs.data.dir.
>>>>
>>>> With RAID0, the reads and writes are going to be spread over the 2
>>>> disks. This is significantly increasing the speed. But if I put 2
>>>> entries in dfs.data.dir, hadoop is going to spread over those 2
>>>> directories too, and at the end, ths results should the same, no?
>>>>
>>>> Any experience/advice/results to share?
>>>>
>>>> Thanks,
>>>>
>>>> JM
>>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB