Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Writing to multiple directories in hadoop


Copy link to this message
-
Re: Writing to multiple directories in hadoop
Hi Jamal,

If I remember correctly, you can use the write(key, value, basePath) method
 of MultipleOutput in your reducer to get different directories.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(KEYOUT,
VALUEOUT, java.lang.String)

Here is what the API says

Use MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath) to
write key and value to a path specified by baseOutputPath, with no need to
specify a named output:

 private MultipleOutputs out;

 public void setup(Context context) {
   out = new MultipleOutputs(context);
   ...
 }

 public void reduce(Text key, Iterable values, Context context) throws
IOException, InterruptedException {
 for (Text t : values) {
   out.write(key, t, generateFileName(<*parameter list...*>));
   }
 }

 protected void cleanup(Context context) throws IOException,
InterruptedException {
   out.close();
 }
Use your own code in generateFileName() to create a custom path to your
results. '/' characters in baseOutputPath will be translated into directory
levels in your file system. Also, append your custom-generated path with
"part" or similar, otherwise your output will be -00000, -00001 etc. No
call to context.write() is necessary. See example generateFileName() code
below.

 private String generateFileName(Text k) {
   // expect Text k in format "Surname|Forename"
   String[] kStr = k.toString().split("\\|");

   String sName = kStr[0];
   String fName = kStr[1];

   // example for k = Smith|John
   // output written to /user/hadoop/path/to/output/Smith/John-r-00000 (etc)
   return sName + "/" + fName;
 }
Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>
On Sat, Oct 12, 2013 at 3:49 AM, jamal sasha <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am trying to separate my output from reducer to different folders..
>
> My dirver has the following code:
>  FileOutputFormat.setOutputPath(job, new Path(output));
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             MultipleOutputs.addNamedOutput(job, "foo",
> TextOutputFormat.class, NullWritable.class, Text.class);
>             MultipleOutputs.addNamedOutput(job, "bar",
> TextOutputFormat.class, Text.class,NullWritable.class);
>             MultipleOutputs.addNamedOutput(job, "foobar",
> TextOutputFormat.class, Text.class, NullWritable.class);
>
> And then my reducer has the following code:
> mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
> mos.write("bar", key,NullWritable.get());
> mos.write("foobar", key,NullWritable.get());
>
> But in the output, I see:
>
> output/foo-r-0001
> output/foo-r-0002
> output/foobar-r-0001
> output/bar-r-0001
>
>
> But what I am trying is :
>
> output/foo/part-r-0001
> output/foo/part-r-0002
> output/bar/part-r-0001
> output/foobar/part-r-0001
>
> How do I do this?
> Thanks
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB