Xiang Hua 2012-10-08, 15:30
Andy Isaacson 2012-10-08, 17:45
Xiang Hua 2012-10-09, 09:13
Andy Isaacson 2012-10-16, 23:45
Can you guys pls move this discussion to user@? Thanks.
On Oct 16, 2012, at 4:45 PM, Andy Isaacson wrote:
> RAID5 is suboptimal for HDFS due to the spindle imbalance issue (among
> other problems). Read this paper for details:
> "Disks are like Snowflakes: No Two Are Alike"
> For best performance configure your storage as JBOD instead of RAID,
> format each spindle as a separate ext4 filesystem, and put a datadir
> on each spindle.
> Your disk array will have a configuration utility to set JBOD instead
> of RAID. Please consult the documentation for your disk array for the
> If you must use RAID5 then one filesystem and one datadir is your best option.
> For *BAD* performance, put multiple logical volumes on a single RAID
> and put multiple datadirs on the RAID. This will result in low IOPS,
> low throughput, and high contention.
> On Tue, Oct 9, 2012 at 2:13 AM, Xiang Hua <[EMAIL PROTECTED]> wrote:
>> but how to "configure disk array as JBOD", we plan to use disk array
>> with RAID5 and make LUN of 1T.
>> so we have many LUN of the size of 1T. and we mkfs on every LUN,so we
>> have 12 fs /data1...../data12, which will be put into HDFS.
>> Best R.
>> On Tue, Oct 9, 2012 at 1:45 AM, Andy Isaacson <[EMAIL PROTECTED]> wrote:
>>> On Mon, Oct 8, 2012 at 8:30 AM, Xiang Hua <[EMAIL PROTECTED]> wrote:
>>>> we have 4T disk from a diskarray.
>>>> i want to split 2T*1 to 1T*2, then add to HDFS, which leads to more
>>>> local storage directories.
>>>> this time we have 12 local directories(1T), is ti harmful to hdfs
>>> Assuming you're running a modern Hadoop on a recent Linux (2.6.38 or
>>> later, or RHEL6):
>>> For best performance you should configure your disk array as JBOD
>>> rather than RAID, then put one ext4 filesystem on each spindle. Do not
>>> put multiple storage directories on a single spindle, that results in
>>> very bad performance and no benefit over a single storage directory
>>> per spindle. And do not put multiple spindles under a single storage
>>> directory, that results in poor utilization and bad performance with
>>> no significant benefit.
>>> 12 local storage directories will perform just fine assuming you have
>>> enough CPU power to use them.
Arun C. Murthy