Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> get name of file in mapper output directory


Copy link to this message
-
Re: get name of file in mapper output directory


The path is defined by the FileOutputFormat in use.  In particular, I think
this function is responsible:

http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html#getDefaultWorkFile(org.apache.hadoop.mapreduce.TaskAttemptContext,
java.lang.String)

It should give you the file path before all tasks have completed and the output
is committed to the final output path.

Luca

On May 23, 2011 14:42:04 Joey Echeverria wrote:
> Hi Mark,
>
> FYI, I'm moving the discussion over to
> [EMAIL PROTECTED] since your question is specific to
> MapReduce.
>
> You can derive the output name from the TaskAttemptID which you can
> get by calling getTaskAttemptID() on the context passed to your
> cleanup() funciton. The task attempt id will look like this:
>
> attempt_200707121733_0003_m_000005_0
>
> You're interested in the m_000005 part, This gets translated into the
> output file name part-m-00005.
>
> -Joey
>
> On Sat, May 21, 2011 at 8:03 PM, Mark question <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> >  I'm running a job with maps only  and I want by end of each map
> > (ie.Close() function) to open the file that the current map has wrote
> > using its output.collector.
> >
> >  I know "job.getWorkingDirectory()"  would give me the parent path of the
> > file written, but how to get the full path or the name (ie. part-00000 or
> > part-00001).
> >
> > Thanks,
> > Mark

--
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel:  +39 0709250452
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB