Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Multiple input formats and multiple output formats in Hadoop 0.20.2


+
Jian Fang 2011-08-10, 16:08
Copy link to this message
-
Re: Multiple input formats and multiple output formats in Hadoop 0.20.2
Hi John,

I think this is what are you looking for:

http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html

http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

Examples of usages are part of API doc.

Regards,
Dino Kečo
On Wed, Aug 10, 2011 at 6:08 PM, Jian Fang <[EMAIL PROTECTED]>wrote:

> Hi,
>
> I am working on a project, which requires multiple input formats and
> multiple output formats. Basically, I store some sales rank data to a
> Cassandra cluster and I get a sales rank update file each day to update the
> ranks in the Cassandra. In the meanwhile, I need to find all the products
> whose rank change exceeds a threshold and output them to a file. That is to
> say, I need two input formats, one from the file system (sales rank update
> file) and one from the Cassandra (current sales rank), and two output
> formats, one to the file system (products whose rank change exceeds a
> threshold) and one to Cassandra (write the new rank to Cassandra).
>
> Right now, I used multiple cascading jobs to do the work and use HDFS to
> share data among jobs. But this is not very efficient since some
> intermediate files need to be read multiple times in different jobs. I
> wonder if there is a more elegant way to solve this problem. Seems Hadoop
> 0.19 supports multiple input/output formats. It would be great if I could
> merge the multiple jobs to one with multiple input formats and multiple
> output formats. Is this doable in Hadoop 0.20.2?  Are there any examples of
> multiple input formats and multiple output formats for Hadoop 0.20.2?
>
> Thanks in advance,
>
> John
>
>
+
Jian Fang 2011-08-10, 16:26
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB