|
|
-
Re: How can I get the intermediate output file from mapper class?Harsh J 2012-08-10, 04:29
Hi,
You need the "file.out" and "file.out.index" files when wanting the map->intermediate->reduce files. So try a pattern that matches these and you should have it. The "XXXXX" kind of files are what MR produces on HDFS as regular outputs - these aren't intermediate. On Fri, Aug 10, 2012 at 8:52 AM, Liu, Raymond <[EMAIL PROTECTED]> wrote: > Hi > > I am trying to access the intermediate file save to the local filesystem from mapreduce's mapper output. > > I have googled this one : http://stackoverflow.com/questions/7867608/hadoop-mapreduce-intermediate-output > > I am using hadoop 1.0.3 , and I did set following property in mapred-site.xml > > <property> > <name>keep.task.files.pattern</name> > <value>.*_m_00000*</value> > </property> > > Then after restart hadoop and run some jobss, I did see tasks in my local dir like: > > /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/ > > But I still cannot find any output dir there. > > I have four disks mount for local dir, and only jars,work dir are find as following: > > <property> > <name>mapred.local.dir</name> > <value>/mnt/DP_disk1/raymond/hdfs/mapred,/mnt/DP_disk2/raymond/hdfs/mapred,/mnt/DP_disk3/raymond/hdfs/mapred,/mnt/DP_disk4/raymond/hdfs/mapred</value> > </property> > > Then I search though them: > > raymond@sr173:~$ ls /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/ > jars job.xml > raymond@sr173:~$ ls /mnt/DP_disk2/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/ > raymond@sr173:~$ ls /mnt/DP_disk3/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/ > jobToken work > raymond@sr173:~$ ls /mnt/DP_disk4/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/ > > And I also search the ttprivate dir, no luck there : > > raymond@sr173:~$ ls /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcache/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/taskjvm.sh > /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcache/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/taskjvm.sh > > So, Is there anything I am still missing? > > > Best Regards, > Raymond Liu > -- Harsh J |