Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> one or more file system


Copy link to this message
-
Re: one or more file system
Can you guys pls move this discussion to user@? Thanks.

On Oct 16, 2012, at 4:45 PM, Andy Isaacson wrote:

> RAID5 is suboptimal for HDFS due to the spindle imbalance issue (among
> other problems). Read this paper for details:
>
> "Disks are like Snowflakes: No Two Are Alike"
> www.usenix.org/event/hotos11/tech/final_files/Krevat.pdf
>
> For best performance configure your storage as JBOD instead of RAID,
> format each spindle as a separate ext4 filesystem, and put a datadir
> on each spindle.
>
> Your disk array will have a configuration utility to set JBOD instead
> of RAID. Please consult the documentation for your disk array for the
> details.
>
> If you must use RAID5 then one filesystem and one datadir is your best option.
>
> For *BAD* performance, put multiple logical volumes on a single RAID
> and put multiple datadirs on the RAID. This will result in low IOPS,
> low throughput, and high contention.
>
> -andy
>
> On Tue, Oct 9, 2012 at 2:13 AM, Xiang Hua <[EMAIL PROTECTED]> wrote:
>> Hi,
>>   but how to "configure disk array as JBOD", we plan to use disk array
>> with RAID5 and make LUN of 1T.
>>  so we have many LUN of the size of 1T. and we mkfs on every LUN,so we
>> have  12 fs /data1...../data12, which will be put into HDFS.
>>
>>
>> Best R.
>>
>> beatls
>>
>> On Tue, Oct 9, 2012 at 1:45 AM, Andy Isaacson <[EMAIL PROTECTED]> wrote:
>>
>>> On Mon, Oct 8, 2012 at 8:30 AM, Xiang Hua <[EMAIL PROTECTED]> wrote:
>>>> Hi,
>>>>   we have 4T disk from a diskarray.
>>>>   i want to split 2T*1 to 1T*2, then add to HDFS, which leads to more
>>>> local storage directories.
>>>>   this time we have 12 local directories(1T), is ti harmful to hdfs
>>>> performance?
>>>
>>> Assuming you're running a modern Hadoop on a recent Linux (2.6.38 or
>>> later, or RHEL6):
>>>
>>> For best performance you should configure your disk array as JBOD
>>> rather than RAID, then put one ext4 filesystem on each spindle. Do not
>>> put multiple storage directories on a single spindle, that results in
>>> very bad performance and no benefit over a single storage directory
>>> per spindle. And do not put multiple spindles under a single storage
>>> directory, that results in poor utilization and bad performance with
>>> no significant benefit.
>>>
>>> 12 local storage directories will perform just fine assuming you have
>>> enough CPU power to use them.
>>>
>>> -andy
>>>

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB