Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - How can I get the intermediate output file from mapper class?


+
Liu, Raymond 2012-08-10, 03:22
Copy link to this message
-
Re: How can I get the intermediate output file from mapper class?
Harsh J 2012-08-10, 04:29
Hi,

You need the "file.out" and "file.out.index" files when wanting the
map->intermediate->reduce files. So try a pattern that matches these
and you should have it.

The "XXXXX" kind of files are what MR produces on HDFS as regular
outputs - these aren't intermediate.

On Fri, Aug 10, 2012 at 8:52 AM, Liu, Raymond <[EMAIL PROTECTED]> wrote:
> Hi
>
>         I am trying to access the intermediate file save to the local filesystem from mapreduce's mapper output.
>
>         I have googled this one : http://stackoverflow.com/questions/7867608/hadoop-mapreduce-intermediate-output
>
>         I am using hadoop 1.0.3 , and I did set following property in mapred-site.xml
>
> <property>
>   <name>keep.task.files.pattern</name>
>   <value>.*_m_00000*</value>
> </property>
>
> Then after restart hadoop and run some jobss, I did see tasks in my local dir like:
>
> /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
>
> But I still cannot find any output dir there.
>
> I have four disks mount for local dir, and only jars,work dir are find as following:
>
> <property>
> <name>mapred.local.dir</name>
> <value>/mnt/DP_disk1/raymond/hdfs/mapred,/mnt/DP_disk2/raymond/hdfs/mapred,/mnt/DP_disk3/raymond/hdfs/mapred,/mnt/DP_disk4/raymond/hdfs/mapred</value>
> </property>
>
> Then I search though them:
>
> raymond@sr173:~$ ls /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
> jars  job.xml
> raymond@sr173:~$ ls /mnt/DP_disk2/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
> raymond@sr173:~$ ls /mnt/DP_disk3/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
> jobToken  work
> raymond@sr173:~$ ls /mnt/DP_disk4/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
>
> And I also search the ttprivate dir, no luck there :
>
> raymond@sr173:~$ ls /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcache/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/taskjvm.sh
> /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcache/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/taskjvm.sh
>
> So, Is there anything I am still missing?
>
>
> Best Regards,
> Raymond Liu
>

--
Harsh J
+
Liu, Raymond 2012-08-10, 06:42
+
Liu, Raymond 2012-08-10, 07:24