Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> How can I get the intermediate output file from mapper class?


Copy link to this message
-
Re: How can I get the intermediate output file from mapper class?
Hi,

You need the "file.out" and "file.out.index" files when wanting the
map->intermediate->reduce files. So try a pattern that matches these
and you should have it.

The "XXXXX" kind of files are what MR produces on HDFS as regular
outputs - these aren't intermediate.

On Fri, Aug 10, 2012 at 8:52 AM, Liu, Raymond <[EMAIL PROTECTED]> wrote:
> Hi
>
>         I am trying to access the intermediate file save to the local filesystem from mapreduce's mapper output.
>
>         I have googled this one : http://stackoverflow.com/questions/7867608/hadoop-mapreduce-intermediate-output
>
>         I am using hadoop 1.0.3 , and I did set following property in mapred-site.xml
>
> <property>
>   <name>keep.task.files.pattern</name>
>   <value>.*_m_00000*</value>
> </property>
>
> Then after restart hadoop and run some jobss, I did see tasks in my local dir like:
>
> /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
>
> But I still cannot find any output dir there.
>
> I have four disks mount for local dir, and only jars,work dir are find as following:
>
> <property>
> <name>mapred.local.dir</name>
> <value>/mnt/DP_disk1/raymond/hdfs/mapred,/mnt/DP_disk2/raymond/hdfs/mapred,/mnt/DP_disk3/raymond/hdfs/mapred,/mnt/DP_disk4/raymond/hdfs/mapred</value>
> </property>
>
> Then I search though them:
>
> raymond@sr173:~$ ls /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
> jars  job.xml
> raymond@sr173:~$ ls /mnt/DP_disk2/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
> raymond@sr173:~$ ls /mnt/DP_disk3/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
> jobToken  work
> raymond@sr173:~$ ls /mnt/DP_disk4/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
>
> And I also search the ttprivate dir, no luck there :
>
> raymond@sr173:~$ ls /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcache/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/taskjvm.sh
> /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcache/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/taskjvm.sh
>
> So, Is there anything I am still missing?
>
>
> Best Regards,
> Raymond Liu
>

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB