-MultipleOutputs is not working properly when dfs.block.size is changed
Dino Kečo 2011-08-18, 08:30
I have been working on hadoop jobs which are writing output into multiple
files. In Hadoop API I have found class MultipleOutputs which implement this
My use case is to change hdfs block size in one job to increase parallelism
and I am doing that using dfs.block.size configuration property. Part of
output file is missing when I change this property (couple of last lines in
some cases half of line is missing).
I was doing debugging and everything looks fine before calling outputs.write
("sucessfull", KEY, VALUE);
For output format I am using TextOutputFormat.
When I remove MultipleOutputs from my code everything is working ok.
Is there something i am doing wrong or there is issue with multiple outputs
Harsh J 2011-08-18, 10:09
Dino Kečo 2011-08-18, 10:53