Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # general >> one or more file system

Xiang Hua 2012-10-08, 15:30
Andy Isaacson 2012-10-08, 17:45
Xiang Hua 2012-10-09, 09:13
Andy Isaacson 2012-10-16, 23:45
Copy link to this message
Re: one or more file system
Can you guys pls move this discussion to user@? Thanks.

On Oct 16, 2012, at 4:45 PM, Andy Isaacson wrote:

> RAID5 is suboptimal for HDFS due to the spindle imbalance issue (among
> other problems). Read this paper for details:
> "Disks are like Snowflakes: No Two Are Alike"
> www.usenix.org/event/hotos11/tech/final_files/Krevat.pdf
> For best performance configure your storage as JBOD instead of RAID,
> format each spindle as a separate ext4 filesystem, and put a datadir
> on each spindle.
> Your disk array will have a configuration utility to set JBOD instead
> of RAID. Please consult the documentation for your disk array for the
> details.
> If you must use RAID5 then one filesystem and one datadir is your best option.
> For *BAD* performance, put multiple logical volumes on a single RAID
> and put multiple datadirs on the RAID. This will result in low IOPS,
> low throughput, and high contention.
> -andy
> On Tue, Oct 9, 2012 at 2:13 AM, Xiang Hua <[EMAIL PROTECTED]> wrote:
>> Hi,
>>   but how to "configure disk array as JBOD", we plan to use disk array
>> with RAID5 and make LUN of 1T.
>>  so we have many LUN of the size of 1T. and we mkfs on every LUN,so we
>> have  12 fs /data1...../data12, which will be put into HDFS.
>> Best R.
>> beatls
>> On Tue, Oct 9, 2012 at 1:45 AM, Andy Isaacson <[EMAIL PROTECTED]> wrote:
>>> On Mon, Oct 8, 2012 at 8:30 AM, Xiang Hua <[EMAIL PROTECTED]> wrote:
>>>> Hi,
>>>>   we have 4T disk from a diskarray.
>>>>   i want to split 2T*1 to 1T*2, then add to HDFS, which leads to more
>>>> local storage directories.
>>>>   this time we have 12 local directories(1T), is ti harmful to hdfs
>>>> performance?
>>> Assuming you're running a modern Hadoop on a recent Linux (2.6.38 or
>>> later, or RHEL6):
>>> For best performance you should configure your disk array as JBOD
>>> rather than RAID, then put one ext4 filesystem on each spindle. Do not
>>> put multiple storage directories on a single spindle, that results in
>>> very bad performance and no benefit over a single storage directory
>>> per spindle. And do not put multiple spindles under a single storage
>>> directory, that results in poor utilization and bad performance with
>>> no significant benefit.
>>> 12 local storage directories will perform just fine assuming you have
>>> enough CPU power to use them.
>>> -andy

Arun C. Murthy
Hortonworks Inc.