Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> get name of file in mapper output directory


Copy link to this message
-
Re: get name of file in mapper output directory


The path is defined by the FileOutputFormat in use.  In particular, I think
this function is responsible:

http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html#getDefaultWorkFile(org.apache.hadoop.mapreduce.TaskAttemptContext,
java.lang.String)

It should give you the file path before all tasks have completed and the output
is committed to the final output path.

Luca

On May 23, 2011 14:42:04 Joey Echeverria wrote:
> Hi Mark,
>
> FYI, I'm moving the discussion over to
> [EMAIL PROTECTED] since your question is specific to
> MapReduce.
>
> You can derive the output name from the TaskAttemptID which you can
> get by calling getTaskAttemptID() on the context passed to your
> cleanup() funciton. The task attempt id will look like this:
>
> attempt_200707121733_0003_m_000005_0
>
> You're interested in the m_000005 part, This gets translated into the
> output file name part-m-00005.
>
> -Joey
>
> On Sat, May 21, 2011 at 8:03 PM, Mark question <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> >  I'm running a job with maps only  and I want by end of each map
> > (ie.Close() function) to open the file that the current map has wrote
> > using its output.collector.
> >
> >  I know "job.getWorkingDirectory()"  would give me the parent path of the
> > file written, but how to get the full path or the name (ie. part-00000 or
> > part-00001).
> >
> > Thanks,
> > Mark

--
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel:  +39 0709250452