Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> MultipleOutputs is not working properly when dfs.block.size is changed


Copy link to this message
-
MultipleOutputs is not working properly when dfs.block.size is changed
Hi all,

I have been working on hadoop jobs which are writing output into multiple
files. In Hadoop API I have found class MultipleOutputs which implement this
functionality.

My use case is to change hdfs block size in one job to increase parallelism
and I am doing that using dfs.block.size configuration property. Part of
output file is missing when I change this property (couple of last lines in
some cases half of line is missing).

I was doing debugging and everything looks fine before calling outputs.write
("sucessfull", KEY, VALUE);
For output format I am using TextOutputFormat.

When I remove MultipleOutputs from my code everything is working ok.

Is there something i am doing wrong or there is issue with multiple outputs
?

regards,
dino