Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Merge Reducers Output


+
Mike S 2012-07-31, 00:59
+
Michael Segel 2012-07-31, 02:08
+
Bejoy KS 2012-07-31, 04:30
Copy link to this message
-
Re: Merge Reducers Output
Its not clear to me that you need custom input formats....

1) Getmerge might work or

2) Simply run a SINGLE reducer job (have mappers output static final int
key=1, or specify numReducers=1).

In this case, only one reducer will be called, and it will read through all
the values.

On Tue, Jul 31, 2012 at 12:30 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:

> Hi
>
> Why not use 'hadoop fs -getMerge <outputFolderInHdfs>
> <targetFileNameInLfs>' while copying files out of hdfs for the end users to
> consume. This will merge all the files in 'outputFolderInHdfs'  into one
> file and put it in lfs.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Michael Segel <[EMAIL PROTECTED]>
> Date: Mon, 30 Jul 2012 21:08:22
> To: <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> Subject: Re: Merge Reducers Output
>
> Why not use a combiner?
>
> On Jul 30, 2012, at 7:59 PM, Mike S wrote:
>
> > Liked asked several times, I need to merge my reducers output files.
> > Imagine I have many reducers which will generate 200 files. Now to
> > merge them together, I have written another map reduce job where each
> > mapper read a complete file in full in memory, and output that and
> > then only one reducer has to merge them together. To do so, I had to
> > write a custom fileinputreader that reads the complete file into
> > memory and then another custom fileoutputfileformat to append the each
> > reducer item bytes together. this how my mapper and reducers looks
> > like
> >
> > public static class MapClass extends Mapper<NullWritable,
> > BytesWritable, IntWritable, BytesWritable>
> >       {
> >               @Override
> >               public void map(NullWritable key, BytesWritable value,
> Context
> > context) throws IOException, InterruptedException
> >               {
> >                       context.write(key, value);
> >               }
> >       }
> >
> >       public static class Reduce extends Reducer<NullWritable,
> > BytesWritable, NullWritable, BytesWritable>
> >       {
> >               @Override
> >               public void reduce(NullWritable key,
> Iterable<BytesWritable> values,
> > Context context) throws IOException, InterruptedException
> >               {
> >                       for (BytesWritable value : values)
> >                       {
> >                               context.write(NullWritable.get(), value);
> >                       }
> >               }
> >       }
> >
> > I still have to have one reducers and that is a bottle neck. Please
> > note that I must do this merging as the users of my MR job are outside
> > my hadoop environment and the result as one file.
> >
> > Is there better way to merge reducers output files?
> >
>
>
--
Jay Vyas
MMSB/UCHC
+
Michael Segel 2012-07-31, 12:24
+
Raj Vishwanathan 2012-07-31, 15:10
+
Michael Segel 2012-07-31, 20:44
+
Mike S 2012-08-01, 00:28