Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> hadoop.tmp.dir with multiple disks


+
mete 2012-04-22, 07:20
+
Harsh J 2012-04-22, 08:14
Copy link to this message
-
Re: hadoop.tmp.dir with multiple disks
I don't understand why multiple disks would be particularly beneficial for
a Map/Reduce job..... would I/O for a map/reduce job be i/o *as well as CPU
bound* ?   I would think that simply reading and parsing large files would
still require dedicated CPU blocks. ?

On Sun, Apr 22, 2012 at 3:14 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> You can use mapred.local.dir for this purpose. It accepts a list of
> directories tasks may use, just like dfs.data.dir uses multiple disks
> for block writes/reads.
>
> On Sun, Apr 22, 2012 at 12:50 PM, mete <[EMAIL PROTECTED]> wrote:
> > Hello folks,
> >
> > I have a job that processes text files from hdfs on local fs (temp
> > directory) and then copies those back to hdfs.
> > I added another drive to each server to have better io performance, but
> as
> > far as i could see hadoop.tmp.dir will not benefit from multiple
> disks,even
> > if i setup two different folders on different disks. (dfs.data.dir works
> > fine). As a result the disk with temp folder set is highy utilized, where
> > the other one is a little bit idle.
> > Does anyone have an idea on what to do? (i am using cdh3u3)
> >
> > Thanks in advance
> > Mete
>
>
>
> --
> Harsh J
>

--
Jay Vyas
MMSB/UCHC
+
Edward Capriolo 2012-04-22, 14:44
+
mete 2012-04-23, 07:40