Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Writing to multiple directories in hadoop


+
jamal sasha 2013-10-11, 22:19
Copy link to this message
-
Re: Writing to multiple directories in hadoop
Sonal Goyal 2013-10-12, 14:53
Hi Jamal,

If I remember correctly, you can use the write(key, value, basePath) method
 of MultipleOutput in your reducer to get different directories.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(KEYOUT,
VALUEOUT, java.lang.String)

Here is what the API says

Use MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath) to
write key and value to a path specified by baseOutputPath, with no need to
specify a named output:

 private MultipleOutputs out;

 public void setup(Context context) {
   out = new MultipleOutputs(context);
   ...
 }

 public void reduce(Text key, Iterable values, Context context) throws
IOException, InterruptedException {
 for (Text t : values) {
   out.write(key, t, generateFileName(<*parameter list...*>));
   }
 }

 protected void cleanup(Context context) throws IOException,
InterruptedException {
   out.close();
 }
Use your own code in generateFileName() to create a custom path to your
results. '/' characters in baseOutputPath will be translated into directory
levels in your file system. Also, append your custom-generated path with
"part" or similar, otherwise your output will be -00000, -00001 etc. No
call to context.write() is necessary. See example generateFileName() code
below.

 private String generateFileName(Text k) {
   // expect Text k in format "Surname|Forename"
   String[] kStr = k.toString().split("\\|");

   String sName = kStr[0];
   String fName = kStr[1];

   // example for k = Smith|John
   // output written to /user/hadoop/path/to/output/Smith/John-r-00000 (etc)
   return sName + "/" + fName;
 }
Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>
On Sat, Oct 12, 2013 at 3:49 AM, jamal sasha <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am trying to separate my output from reducer to different folders..
>
> My dirver has the following code:
>  FileOutputFormat.setOutputPath(job, new Path(output));
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             MultipleOutputs.addNamedOutput(job, "foo",
> TextOutputFormat.class, NullWritable.class, Text.class);
>             MultipleOutputs.addNamedOutput(job, "bar",
> TextOutputFormat.class, Text.class,NullWritable.class);
>             MultipleOutputs.addNamedOutput(job, "foobar",
> TextOutputFormat.class, Text.class, NullWritable.class);
>
> And then my reducer has the following code:
> mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
> mos.write("bar", key,NullWritable.get());
> mos.write("foobar", key,NullWritable.get());
>
> But in the output, I see:
>
> output/foo-r-0001
> output/foo-r-0002
> output/foobar-r-0001
> output/bar-r-0001
>
>
> But what I am trying is :
>
> output/foo/part-r-0001
> output/foo/part-r-0002
> output/bar/part-r-0001
> output/foobar/part-r-0001
>
> How do I do this?
> Thanks
>