Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Single Output file from STORE command


Copy link to this message
-
Re: Single Output file from STORE command
Alan Gates 2013-05-28, 15:29
Nothing that uses MapReduce as an underlying execution engine creates a single file when running multiple reducers because MapReduce doesn't.  The real question is if you want to keep the file on Hadoop, why worry about whether it's a single file?  Most applications on Hadoop will take a directory as an input and read all the files contained in it.

Alan.

On May 24, 2013, at 12:11 PM, Mix Nin wrote:

> STORE command produces multiple output files. I want a single output file
> and I tried using command as below
>
> STORE (foreach (group NoNullData all) generate flatten($1))  into 'xxxx';
>
> This command produces one single file but at the same time forces to use
> single reducer which kills performance.
>
> How do I overcome the scenario?
>
> Normally   STORE command produces multiple output files, apart from that I
> see another file
> "_SUCCESS" in output directory. I ma generating metadata file  ( using
> PigStorage('\t', '-schema') ) in output directory
>
> I thought of using  getmerge as follows
>
> *hadoop* fs -*getmerge* <dir_of_input_files>   <local file>
>
> But this requires
> 1)eliminating files other than data files in HDFS directory
> 2)It creates a single file in local directory but not in HDFS directory
> 3)I need to again move file from local directory to HDFS directory which
> may  take additional time , depending on size of single file
> 4)I need to agin place the files which I eliminated in Step 1
>
>
> Is there an efficient way for my problem?
>
> Thanks