Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: hadoop disk selection


Copy link to this message
-
Re: hadoop disk selection
Moving this to user@ since it's not appropriate for general@.

On Fri, Sep 28, 2012 at 11:16 PM, Xiang Hua <[EMAIL PROTECTED]> wrote:
> Hi,
>   i want to  select 4(600G) local disks combined with  3*800G disks form
> diskarray  in one datanode.
>   is there any problem? performance ?

The recommended configuration would be to partition and format each
disk with ext4, then set dfs.datanode.data.dir to point to the
mountpoints of each disk:

  <property>
     <name>dfs.datanode.data.dir</name>
     <value>/data/1/datadir,/data/2/datadir,/data/3/datadir</value>
  </property>

You may also want to set dfs.datanode.du.reserved to 1GB or thereabouts.

With this configuration your DN will fill all 7 datadir at the same
rate pseudorandomly, until the 600G disks are nearly full, then it
will write any further blocks to the 800G disks. Performance will be
OK except that you will see performance hot-spots on the larger disks
when writing past the 600GB mark. See
https://issues.apache.org/jira/browse/HDFS-1564 for one missing
feature in this area.

I would not recommend using RAID-0 for datadir because if you
experience a disk failure with independent filesystems, only the
blocks on one datadir are lost and need to be rereplicated. If you
experience a disk failure with RAID-0, all blocks stored on that DN
are lost and need to be rereplicated. Also, RAID results in
performance lockstep; a single slow disk will slow down access to all
blocks on that DN, while with independent filesystems a single slow
disk slows down only a fraction of the blocks on that DN.

-andy
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB