|
|
-
Outputformat and RecordWriter in Hadoop Pipes
Vivek K 2011-09-13, 16:27
Hi all,
I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I have been able to successfully work with my own mappers and reducers, but now I need to generate output (from reducer) in a format different from the default TextOutputFormat. I have a few questions:
(1) Similar to Hadoop streaming, is there an option to set OutputFormat in HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ? I am using Hadoop version 0.20.2.
(2) For a simple test on how to use an in-built non-default writer, I tried the following:
hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=false -input input.seq -output output -inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer org.apache.hadoop.io.SequenceFile.Writer -program my_test_program
However this fails with a ClassNotFound exception. And if I remove the -writer flag and use the default writer, it works just fine.
(3) Is there some example or discussion related to how to write your own RecordWriter and run it with Hadoop-pipes ?
Thanks.
Best, Vivek --
+
Vivek K 2011-09-13, 16:27
-
Re: Outputformat and RecordWriter in Hadoop Pipes
Vivek K 2011-09-20, 21:56
It would very helpful if someone can point to where I can possibly find a solution to this problem.
Thanks. Vivek -- On Tue, Sep 13, 2011 at 12:27 PM, Vivek K <[EMAIL PROTECTED]> wrote:
> Hi all, > > I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I > have been able to successfully work with my own mappers and reducers, but > now I need to generate output (from reducer) in a format different from the > default TextOutputFormat. I have a few questions: > > (1) Similar to Hadoop streaming, is there an option to set OutputFormat in > HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ? > I am using Hadoop version 0.20.2. > > (2) For a simple test on how to use an in-built non-default writer, I tried > the following: > > hadoop pipes -D hadoop.pipes.java.recordreader=true -D > hadoop.pipes.java.recordwriter=false -input input.seq -output output > -inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer > org.apache.hadoop.io.SequenceFile.Writer -program my_test_program > > However this fails with a ClassNotFound exception. And if I remove the > -writer flag and use the default writer, it works just fine. > > (3) Is there some example or discussion related to how to write your own > RecordWriter and run it with Hadoop-pipes ? > > Thanks. > > Best, > Vivek > -- > >
+
Vivek K 2011-09-20, 21:56
-
Re: Outputformat and RecordWriter in Hadoop Pipes
Brock Noland 2011-09-20, 22:25
Hi,
On Tue, Sep 13, 2011 at 12:27 PM, Vivek K <[EMAIL PROTECTED]> wrote: > Hi all, > > I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I > have been able to successfully work with my own mappers and reducers, but > now I need to generate output (from reducer) in a format different from the > default TextOutputFormat. I have a few questions: > > (1) Similar to Hadoop streaming, is there an option to set OutputFormat in > HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ? > I am using Hadoop version 0.20.2. > > (2) For a simple test on how to use an in-built non-default writer, I tried > the following: > > hadoop pipes -D hadoop.pipes.java.recordreader=true -D > hadoop.pipes.java.recordwriter=false -input input.seq -output output > -inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer > org.apache.hadoop.io.SequenceFile.Writer -program my_test_program -writer wants an outputformat:
if (results.hasOption("writer")) { setIsJavaRecordWriter(job, true); job.setOutputFormat(getClass(results, "writer", job, OutputFormat.class));
As such I think you want:
-writer org.apache.hadoop.mapred.SequenceFileOutputFormat
SequenceFile.Writer simply writes sequence files has nothing todo with MapReduce.
This is also wrong:
hadoop.pipes.java.recordwriter=false
Brock
+
Brock Noland 2011-09-20, 22:25
-
Re: Outputformat and RecordWriter in Hadoop Pipes
Vivek K 2011-09-20, 23:04
Hi Brock
Thanks for a prompt and to-the-point response. It is working as you said.
Best, Vivek -- On Tue, Sep 20, 2011 at 6:25 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
> Hi, > > On Tue, Sep 13, 2011 at 12:27 PM, Vivek K <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I > > have been able to successfully work with my own mappers and reducers, but > > now I need to generate output (from reducer) in a format different from > the > > default TextOutputFormat. I have a few questions: > > > > (1) Similar to Hadoop streaming, is there an option to set OutputFormat > in > > HadoopPipes (in order to use say > org.apache.hadoop.io.SequenceFile.Writer) ? > > I am using Hadoop version 0.20.2. > > > > (2) For a simple test on how to use an in-built non-default writer, I > tried > > the following: > > > > hadoop pipes -D hadoop.pipes.java.recordreader=true -D > > hadoop.pipes.java.recordwriter=false -input input.seq -output output > > -inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer > > org.apache.hadoop.io.SequenceFile.Writer -program my_test_program > > > -writer wants an outputformat: > > if (results.hasOption("writer")) { > setIsJavaRecordWriter(job, true); > job.setOutputFormat(getClass(results, "writer", job, > OutputFormat.class)); > > > > As such I think you want: > > -writer org.apache.hadoop.mapred.SequenceFileOutputFormat > > SequenceFile.Writer simply writes sequence files has nothing todo with > MapReduce. > > This is also wrong: > > hadoop.pipes.java.recordwriter=false > > Brock >
+
Vivek K 2011-09-20, 23:04
|
|