Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Single Output file from STORE command

Copy link to this message
Single Output file from STORE command
STORE command produces multiple output files. I want a single output file
and I tried using command as below

STORE (foreach (group NoNullData all) generate flatten($1))  into 'xxxx';

This command produces one single file but at the same time forces to use
single reducer which kills performance.

How do I overcome the scenario?

Normally   STORE command produces multiple output files, apart from that I
see another file
"_SUCCESS" in output directory. I ma generating metadata file  ( using
PigStorage('\t', '-schema') ) in output directory

I thought of using  getmerge as follows

*hadoop* fs -*getmerge* <dir_of_input_files>   <local file>

But this requires
1)eliminating files other than data files in HDFS directory
2)It creates a single file in local directory but not in HDFS directory
3)I need to again move file from local directory to HDFS directory which
may  take additional time , depending on size of single file
4)I need to agin place the files which I eliminated in Step 1
Is there an efficient way for my problem?

Alan Gates 2013-05-28, 15:29
Aniket Mokashi 2013-06-03, 07:34