Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> When and who move the reduce output file part-0000X to the final output directory


Copy link to this message
-
When and who move the reduce output file part-0000X to the final output directory
Dear all,

     I am looking into the MR work flow, and want to know more details
about the reduce output data copy .

    Here is my question.

   For the DFSIO test or some other MR jobs. Each reduce task will run on a
TT, and generate files to some dirs named like this:  "
XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also be
a result file named part-00000.

  After the reducer done the task. the reducer output data part-00000
should be moved from  the local disk to the HDFS.

My question is: Is that the time that when reducer finish the task that
part-00000 will be copied to the HDFS? Who make this file copy happen? The
Reducer child? The TaskTracker which run the reduce task? Or the JobTracker?

Thanks,

yours,
Kun Ling

--
http://www.lingcc.com
+
Harsh J 2013-05-10, 05:26
+
Ling Kun 2013-05-10, 05:40
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB