Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> MultipleOutputs is not working properly when dfs.block.size is changed


Copy link to this message
-
MultipleOutputs is not working properly when dfs.block.size is changed
Hi all,

I have been working on hadoop jobs which are writing output into multiple
files. In Hadoop API I have found class MultipleOutputs which implement this
functionality.

My use case is to change hdfs block size in one job to increase parallelism
and I am doing that using dfs.block.size configuration property. Part of
output file is missing when I change this property (couple of last lines in
some cases half of line is missing).

I was doing debugging and everything looks fine before calling outputs.write
("sucessfull", KEY, VALUE);
For output format I am using TextOutputFormat.

When I remove MultipleOutputs from my code everything is working ok.

Is there something i am doing wrong or there is issue with multiple outputs
?

regards,
dino
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB