Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Writing to multiple directories in hadoop

jamal sasha 2013-10-11, 22:19
Copy link to this message
Re: Writing to multiple directories in hadoop
Hi Jamal,

If I remember correctly, you can use the write(key, value, basePath) method
 of MultipleOutput in your reducer to get different directories.

VALUEOUT, java.lang.String)

Here is what the API says

Use MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath) to
write key and value to a path specified by baseOutputPath, with no need to
specify a named output:

 private MultipleOutputs out;

 public void setup(Context context) {
   out = new MultipleOutputs(context);

 public void reduce(Text key, Iterable values, Context context) throws
IOException, InterruptedException {
 for (Text t : values) {
   out.write(key, t, generateFileName(<*parameter list...*>));

 protected void cleanup(Context context) throws IOException,
InterruptedException {
Use your own code in generateFileName() to create a custom path to your
results. '/' characters in baseOutputPath will be translated into directory
levels in your file system. Also, append your custom-generated path with
"part" or similar, otherwise your output will be -00000, -00001 etc. No
call to context.write() is necessary. See example generateFileName() code

 private String generateFileName(Text k) {
   // expect Text k in format "Surname|Forename"
   String[] kStr = k.toString().split("\\|");

   String sName = kStr[0];
   String fName = kStr[1];

   // example for k = Smith|John
   // output written to /user/hadoop/path/to/output/Smith/John-r-00000 (etc)
   return sName + "/" + fName;
Best Regards,
Nube Technologies <http://www.nubetech.co>

On Sat, Oct 12, 2013 at 3:49 AM, jamal sasha <[EMAIL PROTECTED]> wrote:

> Hi,
> I am trying to separate my output from reducer to different folders..
> My dirver has the following code:
>  FileOutputFormat.setOutputPath(job, new Path(output));
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             MultipleOutputs.addNamedOutput(job, "foo",
> TextOutputFormat.class, NullWritable.class, Text.class);
>             MultipleOutputs.addNamedOutput(job, "bar",
> TextOutputFormat.class, Text.class,NullWritable.class);
>             MultipleOutputs.addNamedOutput(job, "foobar",
> TextOutputFormat.class, Text.class, NullWritable.class);
> And then my reducer has the following code:
> mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
> mos.write("bar", key,NullWritable.get());
> mos.write("foobar", key,NullWritable.get());
> But in the output, I see:
> output/foo-r-0001
> output/foo-r-0002
> output/foobar-r-0001
> output/bar-r-0001
> But what I am trying is :
> output/foo/part-r-0001
> output/foo/part-r-0002
> output/bar/part-r-0001
> output/foobar/part-r-0001
> How do I do this?
> Thanks