Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Single Output file from STORE command


+
Mix Nin 2013-05-24, 19:11
Copy link to this message
-
Re: Single Output file from STORE command
Nothing that uses MapReduce as an underlying execution engine creates a single file when running multiple reducers because MapReduce doesn't.  The real question is if you want to keep the file on Hadoop, why worry about whether it's a single file?  Most applications on Hadoop will take a directory as an input and read all the files contained in it.

Alan.

On May 24, 2013, at 12:11 PM, Mix Nin wrote:

> STORE command produces multiple output files. I want a single output file
> and I tried using command as below
>
> STORE (foreach (group NoNullData all) generate flatten($1))  into 'xxxx';
>
> This command produces one single file but at the same time forces to use
> single reducer which kills performance.
>
> How do I overcome the scenario?
>
> Normally   STORE command produces multiple output files, apart from that I
> see another file
> "_SUCCESS" in output directory. I ma generating metadata file  ( using
> PigStorage('\t', '-schema') ) in output directory
>
> I thought of using  getmerge as follows
>
> *hadoop* fs -*getmerge* <dir_of_input_files>   <local file>
>
> But this requires
> 1)eliminating files other than data files in HDFS directory
> 2)It creates a single file in local directory but not in HDFS directory
> 3)I need to again move file from local directory to HDFS directory which
> may  take additional time , depending on size of single file
> 4)I need to agin place the files which I eliminated in Step 1
>
>
> Is there an efficient way for my problem?
>
> Thanks
+
Aniket Mokashi 2013-06-03, 07:34
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB