Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Merge Reducers Output


Copy link to this message
-
Merge Reducers Output
Liked asked several times, I need to merge my reducers output files.
Imagine I have many reducers which will generate 200 files. Now to
merge them together, I have written another map reduce job where each
mapper read a complete file in full in memory, and output that and
then only one reducer has to merge them together. To do so, I had to
write a custom fileinputreader that reads the complete file into
memory and then another custom fileoutputfileformat to append the each
reducer item bytes together. this how my mapper and reducers looks
like

public static class MapClass extends Mapper<NullWritable,
BytesWritable, IntWritable, BytesWritable>
{
@Override
public void map(NullWritable key, BytesWritable value, Context
context) throws IOException, InterruptedException
{
context.write(key, value);
}
}

public static class Reduce extends Reducer<NullWritable,
BytesWritable, NullWritable, BytesWritable>
{
@Override
public void reduce(NullWritable key, Iterable<BytesWritable> values,
Context context) throws IOException, InterruptedException
{
for (BytesWritable value : values)
{
context.write(NullWritable.get(), value);
}
}
}

 I still have to have one reducers and that is a bottle neck. Please
note that I must do this merging as the users of my MR job are outside
my hadoop environment and the result as one file.

Is there better way to merge reducers output files?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB