|
|
-
Can hadoop.tmp.dir be multivalued?
anil gupta 2012-12-18, 18:45
Hi All,
On my worker nodes i have 10 drives. So, in order to balance disk i/o i wanted to evenly distribute the disk read/write load. "hadoop.tmp.dir" is used for a lot of things in MR.
mapreduce.cluster.local.dir${hadoop.tmp.dir}/mapred/localThe local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. mapreduce.jobtracker.system.dir${hadoop.tmp.dir}/mapred/systemThe directory where MapReduce stores control files. mapreduce.jobtracker.staging.root.dir ${hadoop.tmp.dir}/mapred/stagingThe root of the staging area for users' job files In practice, this should be the directory where users' home directories are located (usually /user) mapreduce.cluster.temp.dir ${hadoop.tmp.dir}/mapred/tempA shared directory for temporary files. I am aware that mapreduce.cluster.local.dir can be multivalued and i can exlicitly set this property but i was wondering that it would be even better if i can set multiple values in hadoop.tmp.dir property. Also, is mapreduce.cluster.temp.dir property multivalued or single valued?
-- Thanks & Regards, Anil Gupta
-
Re: Can hadoop.tmp.dir be multivalued?
Harsh J 2012-12-18, 19:13
The purpose of the hadoop.tmp.dir is as its name says - for actual, temporary data. For a more out-of-box experience, such that users have little trouble configuring to get started, we use it as a base property for several actual required properties. This is not suitable for production of course - and is only done for OOB experience.
If you wish to grant your TaskTracker or NodeManager several disks to parallelize IO upon, use/override their respective local directory configurations - and quit leveraging the out-of-box hadoop.tmp.dir default.
Also, what version of Hadoop are you asking your question around? The property mapreduce.cluster.temp.dir does not exist/is not available in 1.x and is irrelevant in 2.x. It seems to be a legacy property that is no longer utilized.
On Wed, Dec 19, 2012 at 12:15 AM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi All, > > On my worker nodes i have 10 drives. So, in order to balance disk i/o i > wanted to evenly distribute the disk read/write load. "hadoop.tmp.dir" is > used for a lot of things in MR. > > mapreduce.cluster.local.dir${hadoop.tmp.dir}/mapred/localThe local directory > where MapReduce stores intermediate data files. May be a comma-separated > list of directories on different devices in order to spread disk i/o. > Directories that do not exist are ignored. > mapreduce.jobtracker.system.dir${hadoop.tmp.dir}/mapred/systemThe directory > where MapReduce stores control files. > mapreduce.jobtracker.staging.root.dir${hadoop.tmp.dir}/mapred/stagingThe > root of the staging area for users' job files In practice, this should be > the directory where users' home directories are located (usually /user) > mapreduce.cluster.temp.dir${hadoop.tmp.dir}/mapred/tempA shared directory > for temporary files. > > I am aware that mapreduce.cluster.local.dir can be multivalued and i can > exlicitly set this property but i was wondering that it would be even better > if i can set multiple values in hadoop.tmp.dir property. Also, is > mapreduce.cluster.temp.dir property multivalued or single valued? > > -- > Thanks & Regards, > Anil Gupta
-- Harsh J
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext