Nothing that uses MapReduce as an underlying execution engine creates a single file when running multiple reducers because MapReduce doesn't. The real question is if you want to keep the file on Hadoop, why worry about whether it's a single file? Most applications on Hadoop will take a directory as an input and read all the files contained in it.
On May 24, 2013, at 12:11 PM, Mix Nin wrote:
> STORE command produces multiple output files. I want a single output file
> and I tried using command as below
> STORE (foreach (group NoNullData all) generate flatten($1)) into 'xxxx';
> This command produces one single file but at the same time forces to use
> single reducer which kills performance.
> How do I overcome the scenario?
> Normally STORE command produces multiple output files, apart from that I
> see another file
> "_SUCCESS" in output directory. I ma generating metadata file ( using
> PigStorage('\t', '-schema') ) in output directory
> I thought of using getmerge as follows
> *hadoop* fs -*getmerge* <dir_of_input_files> <local file>
> But this requires
> 1)eliminating files other than data files in HDFS directory
> 2)It creates a single file in local directory but not in HDFS directory
> 3)I need to again move file from local directory to HDFS directory which
> may take additional time , depending on size of single file
> 4)I need to agin place the files which I eliminated in Step 1
> Is there an efficient way for my problem?