Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> When and who move the reduce output file part-0000X to the final output directory


+
Ling Kun 2013-05-10, 03:19
+
Harsh J 2013-05-10, 05:26
Copy link to this message
-
Re: When and who move the reduce output file part-0000X to the final output directory
Thanks Harsh!
your reply helps me a lot.

Kun Ling
On Fri, May 10, 2013 at 1:26 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> The task itself moves it when it receives a commitTask message. See
> the OutputCommitter class:
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/OutputCommitter.html#commitTask(org.apache.hadoop.mapred.TaskAttemptContext)
>
> On Fri, May 10, 2013 at 8:49 AM, Ling Kun <[EMAIL PROTECTED]> wrote:
> > Dear all,
> >
> >      I am looking into the MR work flow, and want to know more details
> about
> > the reduce output data copy .
> >
> >     Here is my question.
> >
> >    For the DFSIO test or some other MR jobs. Each reduce task will run
> on a
> > TT, and generate files to some dirs named like this:  "
> > XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also
> be
> > a result file named part-00000.
> >
> >   After the reducer done the task. the reducer output data part-00000
> should
> > be moved from  the local disk to the HDFS.
> >
> > My question is: Is that the time that when reducer finish the task that
> > part-00000 will be copied to the HDFS? Who make this file copy happen?
> The
> > Reducer child? The TaskTracker which run the reduce task? Or the
> JobTracker?
> >
> > Thanks,
> >
> > yours,
> > Kun Ling
> >
> > --
> > http://www.lingcc.com
>
>
>
> --
> Harsh J
>

--
http://www.lingcc.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB