Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # general - one or more file system


+
Xiang Hua 2012-10-08, 15:30
+
Andy Isaacson 2012-10-08, 17:45
+
Xiang Hua 2012-10-09, 09:13
+
Andy Isaacson 2012-10-16, 23:45
Copy link to this message
-
Re: one or more file system
Arun C Murthy 2012-10-16, 23:55
Can you guys pls move this discussion to user@? Thanks.

On Oct 16, 2012, at 4:45 PM, Andy Isaacson wrote:

> RAID5 is suboptimal for HDFS due to the spindle imbalance issue (among
> other problems). Read this paper for details:
>
> "Disks are like Snowflakes: No Two Are Alike"
> www.usenix.org/event/hotos11/tech/final_files/Krevat.pdf
>
> For best performance configure your storage as JBOD instead of RAID,
> format each spindle as a separate ext4 filesystem, and put a datadir
> on each spindle.
>
> Your disk array will have a configuration utility to set JBOD instead
> of RAID. Please consult the documentation for your disk array for the
> details.
>
> If you must use RAID5 then one filesystem and one datadir is your best option.
>
> For *BAD* performance, put multiple logical volumes on a single RAID
> and put multiple datadirs on the RAID. This will result in low IOPS,
> low throughput, and high contention.
>
> -andy
>
> On Tue, Oct 9, 2012 at 2:13 AM, Xiang Hua <[EMAIL PROTECTED]> wrote:
>> Hi,
>>   but how to "configure disk array as JBOD", we plan to use disk array
>> with RAID5 and make LUN of 1T.
>>  so we have many LUN of the size of 1T. and we mkfs on every LUN,so we
>> have  12 fs /data1...../data12, which will be put into HDFS.
>>
>>
>> Best R.
>>
>> beatls
>>
>> On Tue, Oct 9, 2012 at 1:45 AM, Andy Isaacson <[EMAIL PROTECTED]> wrote:
>>
>>> On Mon, Oct 8, 2012 at 8:30 AM, Xiang Hua <[EMAIL PROTECTED]> wrote:
>>>> Hi,
>>>>   we have 4T disk from a diskarray.
>>>>   i want to split 2T*1 to 1T*2, then add to HDFS, which leads to more
>>>> local storage directories.
>>>>   this time we have 12 local directories(1T), is ti harmful to hdfs
>>>> performance?
>>>
>>> Assuming you're running a modern Hadoop on a recent Linux (2.6.38 or
>>> later, or RHEL6):
>>>
>>> For best performance you should configure your disk array as JBOD
>>> rather than RAID, then put one ext4 filesystem on each spindle. Do not
>>> put multiple storage directories on a single spindle, that results in
>>> very bad performance and no benefit over a single storage directory
>>> per spindle. And do not put multiple spindles under a single storage
>>> directory, that results in poor utilization and bad performance with
>>> no significant benefit.
>>>
>>> 12 local storage directories will perform just fine assuming you have
>>> enough CPU power to use them.
>>>
>>> -andy
>>>

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/