-RE: How to set "hadoop.tmp.dir" if I have multiple disks per node?
Vinayakumar B 2013-12-16, 09:27
hadoop.tmp.dir is not the exact configuration you are looking for spreading the disk I/O
This is the default base directory ( its single directory not multiple) used in case you didn’t configure your own directories for processes such as NameNode, DataNode and NodeManager.
Exact configurations where you need to configure comma separated values are as follows.
1. dfs.namenode.name.dir for namenode in hdfs-site.xml
2. dfs.datanode.data.dir for datanode in hdfs-site.xml
3. yarn.nodemanager.local-dirs for NodeManager in yarn-site.xml
Please note all above configurations are for Hadoop 2.x
Configure different subdirectories if you are using same disk for multiple processes.
From: Tao Xiao [mailto:[EMAIL PROTECTED]]
Sent: 16 December 2013 14:42
To: [EMAIL PROTECTED]
Subject: Re: How to set "hadoop.tmp.dir" if I have multiple disks per node?
In order to spread I/O among multiple disks, should I assign a comma-separated list of directories which are located on different disks to "hadoop.tmp.dir"？
2013/12/16 Shekhar Sharma <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
hadoop.tmp.dir is a directory created on local file system
For example if you have set hadoop.tmp.dir property to /home/training/hadoop
This directory will be created when you format the namenode by running
hadoop namenode -format
When you open this folder
you will see two subfolders dfs and mapred.
the /home/training/hadoop/mapred folder will be on HDFS also
Hope this clears
Som Shekhar Sharma
On Mon, Dec 16, 2013 at 1:42 PM, Dieter De Witte <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
> Make sure to also set mapred.local.dir to the same set of output
> directories, this is were the intermediate key-value pairs are stored!
> Regards, Dieter
> 2013/12/16 Tao Xiao <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
>> I have ten disks per node,and I don't know what value I should set to
>> "hadoop.tmp.dir". Some said this property refers to a location in local disk
>> while some other said it refers to a directory in HDFS. I'm confused, who
>> can explain it ?
>> I want to spread I/O since I have ten disks per node, so should I set a
>> comma-separated list of directories (which are on different disks) to
>> "hadoop.tmp.dir" ?