Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Merge Reducers Output


Copy link to this message
-
Re: Merge Reducers Output
Michael Segel 2012-07-31, 02:08
Why not use a combiner?

On Jul 30, 2012, at 7:59 PM, Mike S wrote:

> Liked asked several times, I need to merge my reducers output files.
> Imagine I have many reducers which will generate 200 files. Now to
> merge them together, I have written another map reduce job where each
> mapper read a complete file in full in memory, and output that and
> then only one reducer has to merge them together. To do so, I had to
> write a custom fileinputreader that reads the complete file into
> memory and then another custom fileoutputfileformat to append the each
> reducer item bytes together. this how my mapper and reducers looks
> like
>
> public static class MapClass extends Mapper<NullWritable,
> BytesWritable, IntWritable, BytesWritable>
> {
> @Override
> public void map(NullWritable key, BytesWritable value, Context
> context) throws IOException, InterruptedException
> {
> context.write(key, value);
> }
> }
>
> public static class Reduce extends Reducer<NullWritable,
> BytesWritable, NullWritable, BytesWritable>
> {
> @Override
> public void reduce(NullWritable key, Iterable<BytesWritable> values,
> Context context) throws IOException, InterruptedException
> {
> for (BytesWritable value : values)
> {
> context.write(NullWritable.get(), value);
> }
> }
> }
>
> I still have to have one reducers and that is a bottle neck. Please
> note that I must do this merging as the users of my MR job are outside
> my hadoop environment and the result as one file.
>
> Is there better way to merge reducers output files?
>