STORE command produces multiple output files. I want a single output file
and I tried using command as below
STORE (foreach (group NoNullData all) generate flatten($1)) into 'xxxx';
This command produces one single file but at the same time forces to use
single reducer which kills performance.
How do I overcome the scenario?
Normally STORE command produces multiple output files, apart from that I
see another file
"_SUCCESS" in output directory. I ma generating metadata file ( using
PigStorage('\t', '-schema') ) in output directory
I thought of using getmerge as follows
*hadoop* fs -*getmerge* <dir_of_input_files> <local file>
But this requires
1)eliminating files other than data files in HDFS directory
2)It creates a single file in local directory but not in HDFS directory
3)I need to again move file from local directory to HDFS directory which
may take additional time , depending on size of single file
4)I need to agin place the files which I eliminated in Step 1
Is there an efficient way for my problem?