|
|
+
anil gupta 2012-12-18, 19:41
-
Re: Can hadoop.tmp.dir be multivalued?Harsh J 2012-12-18, 19:50
Hi Anil,
Answering over [EMAIL PROTECTED] [https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user] cause the answer is CDH specific. MR1 properties listing is documented at the MR1 Apache Hadoop docs site, under http://archive.cloudera.com/cdh4/cdh/4/mr1/ at http://archive.cloudera.com/cdh4/cdh/4/mr1/mapred-default.html On Wed, Dec 19, 2012 at 1:11 AM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi Harsh, > > Sorry, i forgot to mention that I am using cdh4.1 and using MRv1. I got the > mapreduce.cluster.temp.dir property from > http://hadoop.apache.org/docs/mapreduce/current/mapred-default.html. Is it > an incorrect source? > Thanks for the prompt reply. > > ~Anil > > > On Tue, Dec 18, 2012 at 11:13 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> The purpose of the hadoop.tmp.dir is as its name says - for actual, >> temporary data. For a more out-of-box experience, such that users have >> little trouble configuring to get started, we use it as a base >> property for several actual required properties. This is not suitable >> for production of course - and is only done for OOB experience. >> >> If you wish to grant your TaskTracker or NodeManager several disks to >> parallelize IO upon, use/override their respective local directory >> configurations - and quit leveraging the out-of-box hadoop.tmp.dir >> default. >> >> Also, what version of Hadoop are you asking your question around? The >> property mapreduce.cluster.temp.dir does not exist/is not available in >> 1.x and is irrelevant in 2.x. It seems to be a legacy property that is >> no longer utilized. >> >> On Wed, Dec 19, 2012 at 12:15 AM, anil gupta <[EMAIL PROTECTED]> >> wrote: >> > Hi All, >> > >> > On my worker nodes i have 10 drives. So, in order to balance disk i/o i >> > wanted to evenly distribute the disk read/write load. "hadoop.tmp.dir" >> > is >> > used for a lot of things in MR. >> > >> > mapreduce.cluster.local.dir${hadoop.tmp.dir}/mapred/localThe local >> > directory >> > where MapReduce stores intermediate data files. May be a comma-separated >> > list of directories on different devices in order to spread disk i/o. >> > Directories that do not exist are ignored. >> > mapreduce.jobtracker.system.dir${hadoop.tmp.dir}/mapred/systemThe >> > directory >> > where MapReduce stores control files. >> > mapreduce.jobtracker.staging.root.dir${hadoop.tmp.dir}/mapred/stagingThe >> > root of the staging area for users' job files In practice, this should >> > be >> > the directory where users' home directories are located (usually /user) >> > mapreduce.cluster.temp.dir${hadoop.tmp.dir}/mapred/tempA shared >> > directory >> > for temporary files. >> > >> > I am aware that mapreduce.cluster.local.dir can be multivalued and i can >> > exlicitly set this property but i was wondering that it would be even >> > better >> > if i can set multiple values in hadoop.tmp.dir property. Also, is >> > mapreduce.cluster.temp.dir property multivalued or single valued? >> > >> > -- >> > Thanks & Regards, >> > Anil Gupta >> >> >> >> -- >> Harsh J > > > > > -- > Thanks & Regards, > Anil Gupta -- Harsh J |