|
|
-
multipleoutputs does not like speculative execution in map-only job
Radim Kolar 2012-09-12, 22:51
with speculative execution enabled Hadoop can run task attempt on more then 1 node. If mapper is using multipleoutputs then second attempt (or sometimes even all) fails to create output file because it is being created by another attempt:
attempt_1347286420691_0011_m_000000_0 attempt_1347286420691_0011_m_000000_1 .. fails with Error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /cznewgen/segments/20120907190053/parse_db/-m-00000
in my code i am using mos.write with 4 arguments. this problem is discussed in javadoc for FileOutputFormat function getWorkOutputPath, its possible to change MultipleOutputs to take advantage of this function?
or its better to change FileOoutputFormat.getUniqueFile() to append last digit in attempt id to filename to create unique names such as /cznewgen/segments/20120907190053/parse_db/-m-00000_0 ?
-
Re: multipleoutputs does not like speculative execution in map-only job
Harsh J 2012-09-13, 03:30
Hey Radim,
Does your job use the FileOutputCommitter?
On Thu, Sep 13, 2012 at 4:21 AM, Radim Kolar <[EMAIL PROTECTED]> wrote: > with speculative execution enabled Hadoop can run task attempt on more then > 1 node. If mapper is using multipleoutputs then second attempt (or sometimes > even all) fails to create output file because it is being created by another > attempt: > > attempt_1347286420691_0011_m_000000_0 > attempt_1347286420691_0011_m_000000_1 > .. > fails with > Error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed > to create file /cznewgen/segments/20120907190053/parse_db/-m-00000 > > in my code i am using mos.write with 4 arguments. this problem is discussed > in javadoc for FileOutputFormat function getWorkOutputPath, its possible to > change MultipleOutputs to take advantage of this function? > > or its better to change FileOoutputFormat.getUniqueFile() to append last > digit in attempt id to filename to create unique names such as > /cznewgen/segments/20120907190053/parse_db/-m-00000_0 ?
-- Harsh J
-
Re: multipleoutputs does not like speculative execution in map-only job
Radim Kolar 2012-09-13, 08:09
> Does your job use the FileOutputCommitter?
yes. job.setOutputFormatClass(SequenceFileOutputFormat.class);
-
Re: multipleoutputs does not like speculative execution in map-only job
Robert Evans 2012-09-13, 15:31
What version of Hadoop is this on?
On 9/13/12 3:09 AM, "Radim Kolar" <[EMAIL PROTECTED]> wrote:
> >> Does your job use the FileOutputCommitter? > >yes. >job.setOutputFormatClass(SequenceFileOutputFormat.class);
-
Re: multipleoutputs does not like speculative execution in map-only job
Radim Kolar 2012-09-13, 19:51
> What version of Hadoop is this on? branch-0.23
-
Re: multipleoutputs does not like speculative execution in map-only job
Harsh J 2012-09-14, 17:31
Hold on. I do not see a _temporary/attemptID path in the path the error reports? Is MO really doing this or are you getting the filename manually from something? With MO, MO builds the file paths on its own, and there's no need to use uniquepath calls or the like.
Sorry I didn't notice this carefully before. If you can share a reproducible test-case job, that'd be of great help.
On Fri, Sep 14, 2012 at 8:36 PM, Robert Evans <[EMAIL PROTECTED]> wrote: > In 0.23 and branch-2 there were a lot of changes that went into the > FileOutputFormat to be able to allow for AppMaster recovery. It is very > likely that this is a regression from the 1.0 line. Do you know if this > works on 1.0? > > On 9/13/12 2:51 PM, "Radim Kolar" <[EMAIL PROTECTED]> wrote: > >> >>> What version of Hadoop is this on? >>branch-0.23 >
-- Harsh J
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext