Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Can hadoop.tmp.dir be multivalued?


+
anil gupta 2012-12-18, 19:41
Copy link to this message
-
Re: Can hadoop.tmp.dir be multivalued?
Hi Anil,

Answering over [EMAIL PROTECTED]
[https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user]
cause the answer is CDH specific.

MR1 properties listing is documented at the MR1 Apache Hadoop docs
site, under http://archive.cloudera.com/cdh4/cdh/4/mr1/ at
http://archive.cloudera.com/cdh4/cdh/4/mr1/mapred-default.html

On Wed, Dec 19, 2012 at 1:11 AM, anil gupta <[EMAIL PROTECTED]> wrote:
> Hi Harsh,
>
> Sorry, i forgot to mention that I am using cdh4.1 and using MRv1. I got the
> mapreduce.cluster.temp.dir property from
> http://hadoop.apache.org/docs/mapreduce/current/mapred-default.html. Is it
> an incorrect source?
> Thanks for the prompt reply.
>
> ~Anil
>
>
> On Tue, Dec 18, 2012 at 11:13 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> The purpose of the hadoop.tmp.dir is as its name says - for actual,
>> temporary data. For a more out-of-box experience, such that users have
>> little trouble configuring to get started, we use it as a base
>> property for several actual required properties. This is not suitable
>> for production of course - and is only done for OOB experience.
>>
>> If you wish to grant your TaskTracker or NodeManager several disks to
>> parallelize IO upon, use/override their respective local directory
>> configurations - and quit leveraging the out-of-box hadoop.tmp.dir
>> default.
>>
>> Also, what version of Hadoop are you asking your question around? The
>> property mapreduce.cluster.temp.dir does not exist/is not available in
>> 1.x and is irrelevant in 2.x. It seems to be a legacy property that is
>> no longer utilized.
>>
>> On Wed, Dec 19, 2012 at 12:15 AM, anil gupta <[EMAIL PROTECTED]>
>> wrote:
>> > Hi All,
>> >
>> > On my worker nodes i have 10 drives. So, in order to balance disk i/o i
>> > wanted to evenly distribute the disk read/write load. "hadoop.tmp.dir"
>> > is
>> > used for a lot of things in MR.
>> >
>> > mapreduce.cluster.local.dir${hadoop.tmp.dir}/mapred/localThe local
>> > directory
>> > where MapReduce stores intermediate data files. May be a comma-separated
>> > list of directories on different devices in order to spread disk i/o.
>> > Directories that do not exist are ignored.
>> > mapreduce.jobtracker.system.dir${hadoop.tmp.dir}/mapred/systemThe
>> > directory
>> > where MapReduce stores control files.
>> > mapreduce.jobtracker.staging.root.dir${hadoop.tmp.dir}/mapred/stagingThe
>> > root of the staging area for users' job files In practice, this should
>> > be
>> > the directory where users' home directories are located (usually /user)
>> > mapreduce.cluster.temp.dir${hadoop.tmp.dir}/mapred/tempA shared
>> > directory
>> > for temporary files.
>> >
>> > I am aware that mapreduce.cluster.local.dir can be multivalued and i can
>> > exlicitly set this property but i was wondering that it would be even
>> > better
>> > if i can set multiple values in hadoop.tmp.dir property. Also, is
>> > mapreduce.cluster.temp.dir property multivalued or single valued?
>> >
>> > --
>> > Thanks & Regards,
>> > Anil Gupta
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta

--
Harsh J