Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Single Output file from STORE command


Copy link to this message
-
Single Output file from STORE command
STORE command produces multiple output files. I want a single output file
and I tried using command as below

STORE (foreach (group NoNullData all) generate flatten($1))  into 'xxxx';

This command produces one single file but at the same time forces to use
single reducer which kills performance.

How do I overcome the scenario?

Normally   STORE command produces multiple output files, apart from that I
see another file
"_SUCCESS" in output directory. I ma generating metadata file  ( using
PigStorage('\t', '-schema') ) in output directory

I thought of using  getmerge as follows

*hadoop* fs -*getmerge* <dir_of_input_files>   <local file>

But this requires
1)eliminating files other than data files in HDFS directory
2)It creates a single file in local directory but not in HDFS directory
3)I need to again move file from local directory to HDFS directory which
may  take additional time , depending on size of single file
4)I need to agin place the files which I eliminated in Step 1
Is there an efficient way for my problem?

Thanks
+
Alan Gates 2013-05-28, 15:29
+
Aniket Mokashi 2013-06-03, 07:34
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB