Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> multipleoutputs does not like speculative execution in map-only job


Copy link to this message
-
multipleoutputs does not like speculative execution in map-only job
with speculative execution enabled Hadoop can run task attempt on more
then 1 node. If mapper is using multipleoutputs then second attempt (or
sometimes even all) fails to create output file because it is being
created by another attempt:

attempt_1347286420691_0011_m_000000_0
attempt_1347286420691_0011_m_000000_1
..
fails with
Error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
failed to create file /cznewgen/segments/20120907190053/parse_db/-m-00000

in my code i am using mos.write with 4 arguments. this problem is
discussed in javadoc for FileOutputFormat function getWorkOutputPath,
its possible to change MultipleOutputs to take advantage of this function?

or its better to change FileOoutputFormat.getUniqueFile() to append last
digit in attempt id to filename to create unique names such as
/cznewgen/segments/20120907190053/parse_db/-m-00000_0 ?