Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Merge Reducers Output


+
Mike S 2012-07-31, 00:59
+
Michael Segel 2012-07-31, 02:08
Copy link to this message
-
Re: Merge Reducers Output
Hi

Why not use 'hadoop fs -getMerge <outputFolderInHdfs> <targetFileNameInLfs>' while copying files out of hdfs for the end users to consume. This will merge all the files in 'outputFolderInHdfs'  into one file and put it in lfs.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Michael Segel <[EMAIL PROTECTED]>
Date: Mon, 30 Jul 2012 21:08:22
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Re: Merge Reducers Output

Why not use a combiner?

On Jul 30, 2012, at 7:59 PM, Mike S wrote:

> Liked asked several times, I need to merge my reducers output files.
> Imagine I have many reducers which will generate 200 files. Now to
> merge them together, I have written another map reduce job where each
> mapper read a complete file in full in memory, and output that and
> then only one reducer has to merge them together. To do so, I had to
> write a custom fileinputreader that reads the complete file into
> memory and then another custom fileoutputfileformat to append the each
> reducer item bytes together. this how my mapper and reducers looks
> like
>
> public static class MapClass extends Mapper<NullWritable,
> BytesWritable, IntWritable, BytesWritable>
> {
> @Override
> public void map(NullWritable key, BytesWritable value, Context
> context) throws IOException, InterruptedException
> {
> context.write(key, value);
> }
> }
>
> public static class Reduce extends Reducer<NullWritable,
> BytesWritable, NullWritable, BytesWritable>
> {
> @Override
> public void reduce(NullWritable key, Iterable<BytesWritable> values,
> Context context) throws IOException, InterruptedException
> {
> for (BytesWritable value : values)
> {
> context.write(NullWritable.get(), value);
> }
> }
> }
>
> I still have to have one reducers and that is a bottle neck. Please
> note that I must do this merging as the users of my MR job are outside
> my hadoop environment and the result as one file.
>
> Is there better way to merge reducers output files?
>

+
Jay Vyas 2012-07-31, 05:08
+
Michael Segel 2012-07-31, 12:24
+
Raj Vishwanathan 2012-07-31, 15:10
+
Michael Segel 2012-07-31, 20:44
+
Mike S 2012-08-01, 00:28
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB