Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How to set SequenceFile.Metadata from within SequenceFileOutputFormat?


Copy link to this message
-
Re: How to set SequenceFile.Metadata from within SequenceFileOutputFormat?
On 08/09/2010 09:14 PM, Harsh J wrote:
> Another solution would be to create a custom named output using
> mapred.lib.MultipleOutputs and collecting to that instead of the
> job-set output format (which one can set to NullOutputFormat so it
> doesn't complain about existing paths, etc.).
>
> So if you'd want 'foo' prefix to your 00000-NNNNN numbered output
> files (instead of default 'part'), you'd create it with
> MultipleOutputs.addNamedOutput(Conf, "foo", YourOutFormat.class,
> Key.class, Value.class);
>
> The extension, I believe, can be changed too, while 'getting' the path
> from the FileOutputFormat while building your RecordWriter. Something
> like:
> Path outPath = FileOutputFormat.getTaskOutputPath(job, name+YOUR_EXTENSION);
> // Now create the 'writer' on this path.

Tnx for the tip - didn't know about MultipleOutputs.  (Though it's
probably overkill for what I'm doing.)

Thanks again,

DR
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB