-Re: Multiple input formats and multiple output formats in Hadoop 0.20.2
Dino Kečo 2011-08-10, 16:20
I think this is what are you looking for:
Examples of usages are part of API doc.
On Wed, Aug 10, 2011 at 6:08 PM, Jian Fang <[EMAIL PROTECTED]>wrote:
> I am working on a project, which requires multiple input formats and
> multiple output formats. Basically, I store some sales rank data to a
> Cassandra cluster and I get a sales rank update file each day to update the
> ranks in the Cassandra. In the meanwhile, I need to find all the products
> whose rank change exceeds a threshold and output them to a file. That is to
> say, I need two input formats, one from the file system (sales rank update
> file) and one from the Cassandra (current sales rank), and two output
> formats, one to the file system (products whose rank change exceeds a
> threshold) and one to Cassandra (write the new rank to Cassandra).
> Right now, I used multiple cascading jobs to do the work and use HDFS to
> share data among jobs. But this is not very efficient since some
> intermediate files need to be read multiple times in different jobs. I
> wonder if there is a more elegant way to solve this problem. Seems Hadoop
> 0.19 supports multiple input/output formats. It would be great if I could
> merge the multiple jobs to one with multiple input formats and multiple
> output formats. Is this doable in Hadoop 0.20.2? Are there any examples of
> multiple input formats and multiple output formats for Hadoop 0.20.2?
> Thanks in advance,