Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Merge Reducers Output

Copy link to this message
Merge Reducers Output
Liked asked several times, I need to merge my reducers output files.
Imagine I have many reducers which will generate 200 files. Now to
merge them together, I have written another map reduce job where each
mapper read a complete file in full in memory, and output that and
then only one reducer has to merge them together. To do so, I had to
write a custom fileinputreader that reads the complete file into
memory and then another custom fileoutputfileformat to append the each
reducer item bytes together. this how my mapper and reducers looks

public static class MapClass extends Mapper<NullWritable,
BytesWritable, IntWritable, BytesWritable>
public void map(NullWritable key, BytesWritable value, Context
context) throws IOException, InterruptedException
context.write(key, value);

public static class Reduce extends Reducer<NullWritable,
BytesWritable, NullWritable, BytesWritable>
public void reduce(NullWritable key, Iterable<BytesWritable> values,
Context context) throws IOException, InterruptedException
for (BytesWritable value : values)
context.write(NullWritable.get(), value);

 I still have to have one reducers and that is a bottle neck. Please
note that I must do this merging as the users of my MR job are outside
my hadoop environment and the result as one file.

Is there better way to merge reducers output files?