Mix Nin 2013-05-24, 19:11
Alan Gates 2013-05-28, 15:29
You can use pig to do what "hadoop fs -getmerge" is doing in a separate pig
script. It will still be one reducer though.
On Tue, May 28, 2013 at 8:29 AM, Alan Gates <[EMAIL PROTECTED]> wrote:
> Nothing that uses MapReduce as an underlying execution engine creates a
> single file when running multiple reducers because MapReduce doesn't. The
> real question is if you want to keep the file on Hadoop, why worry about
> whether it's a single file? Most applications on Hadoop will take a
> directory as an input and read all the files contained in it.
> On May 24, 2013, at 12:11 PM, Mix Nin wrote:
> > STORE command produces multiple output files. I want a single output file
> > and I tried using command as below
> > STORE (foreach (group NoNullData all) generate flatten($1)) into 'xxxx';
> > This command produces one single file but at the same time forces to use
> > single reducer which kills performance.
> > How do I overcome the scenario?
> > Normally STORE command produces multiple output files, apart from that
> > see another file
> > "_SUCCESS" in output directory. I ma generating metadata file ( using
> > PigStorage('\t', '-schema') ) in output directory
> > I thought of using getmerge as follows
> > *hadoop* fs -*getmerge* <dir_of_input_files> <local file>
> > But this requires
> > 1)eliminating files other than data files in HDFS directory
> > 2)It creates a single file in local directory but not in HDFS directory
> > 3)I need to again move file from local directory to HDFS directory which
> > may take additional time , depending on size of single file
> > 4)I need to agin place the files which I eliminated in Step 1
> > Is there an efficient way for my problem?
> > Thanks