Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - get name of file in mapper output directory

Copy link to this message
Re: get name of file in mapper output directory
Luca Pireddu 2011-05-25, 08:51
On May 25, 2011 00:28:10 Mark question wrote:
> thanks both for the comments, but even though finally, I managed to get the
> output file of the current mapper, I couldn't use it because apparently,
> mappers uses " _temporary" file while it's in process. So in Mapper.close ,
> the file for eg. "part-00000" which it wrote to, does not exists yet.
> There has to be another way to get the produced file. I need to sort it
> immediately within mappers.
> Again, your thoughts are really helpful !
> Mark

Indeed, output is written to the _temporary directory and then moved by a
FileOutputCommitter once all tasks are done.

Why do you need to sort within the mappers?  Hadoop sorts as part of the
regular workflow.  In fact, notice that your reducer receives the keys in
sorted order.  You should probably look for a way to satisfy your goal by
adapting bits of the workflow pipeline.

Maybe you should tell us what you're trying to achieve.  If the regular sort
order isn't what you need, then just write a custom sort comparator class,
which you insert into the workflow with Job.setSortComparatorClass.  I can
point you to an example if you need.

Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel:  +39 0709250452