-Re: Configuring Hadoop clusters with multiple PCs, each of which has 2 hard disks (Sata+SSD)
On Thu, Jul 12, 2012 at 2:02 PM, Ivangelion <[EMAIL PROTECTED]> wrote:
Install all hadoop libs on the SATA disk.
> - 1 PC: pure namenode
Configure dfs.name.dir to write to both places, one under SATA disk
and other under SATA, for redundancy (failure tolerance). This is in
> - Other 5 PCs: datanodes (1 of which also serves as secondary namenode)
Configure dfs.data.dir to write to a location on to the SATA disk
(SATA/dfs/data). This is in hdfs-site.xml.
Configure mapred.local.dir to write to a location on the SSD disk
(SSD/mapred/local). This is in mapred-site.xml.
> - Sata disk with bigger size: common HDFS data storage
> - SSD disk with smaller size but faster: temporary data storage when
> processing map reduce jobs or doing data analyzing.
If you limit your MR to use only SSD space, it will get only that much
space to write per mapper. So if a mapper tries to write, or if a
reducer tries to read over 200 GB of data, it may run into space
unavailability issues. To avoid this, configure mapred.local.dir to
use SATA/mapred/local as well, if a problem.
> Is there anything that needs to be modified?
Yes, configure fs.checkpoint.dir to SSD/dfs/namesecondary, for the SNN
to use that. Use the hdfs-site.xml.
After configuring these, you may ignore hadoop.tmp.dir, as it
shouldn't be used for anything else.