AnilKumar B 2012-11-21, 15:07
Radim Kolar 2012-11-21, 15:44
-Re: When speculative execution is true, there is a data loss issue with multpleoutputs
Radim Kolar 2012-11-21, 15:31
Dne 21.11.2012 16:07, AnilKumar B napsal(a):
> Thanks Radim.
> Yes, as you said we are not writing into sub-directory of main job. I
> will try by making them as sub-directories of output dir.
> But one question, when I turn of speculative execution then it is
> working fine with same multiple output directory structure. May I
> know, how exactly it working in this case?
> When we change the speculative execution flag, why exactly there is a
> difference in output data?
because if you are not using multipleoutput then you are not writing to
real file, but to file with name generated from its task attempt in tmp
subdirectory. They do not overwrite each other. In HDFS you can have
only one writer per file.