Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> one or more file system


Copy link to this message
-
Re: one or more file system
RAID5 is suboptimal for HDFS due to the spindle imbalance issue (among
other problems). Read this paper for details:

"Disks are like Snowflakes: No Two Are Alike"
www.usenix.org/event/hotos11/tech/final_files/Krevat.pdf

For best performance configure your storage as JBOD instead of RAID,
format each spindle as a separate ext4 filesystem, and put a datadir
on each spindle.

Your disk array will have a configuration utility to set JBOD instead
of RAID. Please consult the documentation for your disk array for the
details.

If you must use RAID5 then one filesystem and one datadir is your best option.

For *BAD* performance, put multiple logical volumes on a single RAID
and put multiple datadirs on the RAID. This will result in low IOPS,
low throughput, and high contention.

-andy

On Tue, Oct 9, 2012 at 2:13 AM, Xiang Hua <[EMAIL PROTECTED]> wrote:
> Hi,
>    but how to "configure disk array as JBOD", we plan to use disk array
> with RAID5 and make LUN of 1T.
>   so we have many LUN of the size of 1T. and we mkfs on every LUN,so we
> have  12 fs /data1...../data12, which will be put into HDFS.
>
>
> Best R.
>
> beatls
>
> On Tue, Oct 9, 2012 at 1:45 AM, Andy Isaacson <[EMAIL PROTECTED]> wrote:
>
>> On Mon, Oct 8, 2012 at 8:30 AM, Xiang Hua <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> >    we have 4T disk from a diskarray.
>> >    i want to split 2T*1 to 1T*2, then add to HDFS, which leads to more
>> > local storage directories.
>> >    this time we have 12 local directories(1T), is ti harmful to hdfs
>> > performance?
>>
>> Assuming you're running a modern Hadoop on a recent Linux (2.6.38 or
>> later, or RHEL6):
>>
>> For best performance you should configure your disk array as JBOD
>> rather than RAID, then put one ext4 filesystem on each spindle. Do not
>> put multiple storage directories on a single spindle, that results in
>> very bad performance and no benefit over a single storage directory
>> per spindle. And do not put multiple spindles under a single storage
>> directory, that results in poor utilization and bad performance with
>> no significant benefit.
>>
>> 12 local storage directories will perform just fine assuming you have
>> enough CPU power to use them.
>>
>> -andy
>>