Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Outputformat and RecordWriter in Hadoop Pipes


+
Vivek K 2011-09-13, 16:27
+
Vivek K 2011-09-20, 21:56
Copy link to this message
-
Re: Outputformat and RecordWriter in Hadoop Pipes
Hi,

On Tue, Sep 13, 2011 at 12:27 PM, Vivek K <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I
> have been able to successfully work with my own mappers and reducers, but
> now I need to generate output (from reducer) in a format different from the
> default TextOutputFormat. I have a few questions:
>
> (1) Similar to Hadoop streaming, is there an option to set OutputFormat in
> HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ?
> I am using Hadoop version 0.20.2.
>
> (2) For a simple test on how to use an in-built non-default writer, I tried
> the following:
>
>     hadoop pipes -D hadoop.pipes.java.recordreader=true -D
> hadoop.pipes.java.recordwriter=false -input input.seq -output output
> -inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer
> org.apache.hadoop.io.SequenceFile.Writer -program my_test_program
-writer wants an outputformat:

      if (results.hasOption("writer")) {
        setIsJavaRecordWriter(job, true);
        job.setOutputFormat(getClass(results, "writer", job,
                                      OutputFormat.class));

As such I think you want:

-writer org.apache.hadoop.mapred.SequenceFileOutputFormat

SequenceFile.Writer simply writes sequence files has nothing todo with
MapReduce.

This is also wrong:

hadoop.pipes.java.recordwriter=false

Brock
+
Vivek K 2011-09-20, 23:04
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB