I don't understand why multiple disks would be particularly beneficial for
a Map/Reduce job..... would I/O for a map/reduce job be i/o *as well as CPU
bound* ? I would think that simply reading and parsing large files would
still require dedicated CPU blocks. ?
On Sun, Apr 22, 2012 at 3:14 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> You can use mapred.local.dir for this purpose. It accepts a list of
> directories tasks may use, just like dfs.data.dir uses multiple disks
> for block writes/reads.
> On Sun, Apr 22, 2012 at 12:50 PM, mete <[EMAIL PROTECTED]> wrote:
> > Hello folks,
> > I have a job that processes text files from hdfs on local fs (temp
> > directory) and then copies those back to hdfs.
> > I added another drive to each server to have better io performance, but
> > far as i could see hadoop.tmp.dir will not benefit from multiple
> > if i setup two different folders on different disks. (dfs.data.dir works
> > fine). As a result the disk with temp folder set is highy utilized, where
> > the other one is a little bit idle.
> > Does anyone have an idea on what to do? (i am using cdh3u3)
> > Thanks in advance
> > Mete
> Harsh J