-Re: how to specify MultipleOutputs, MultipleInputs in using Avro mapred API
Harsh J 2010-08-18, 17:49
On Wed, Aug 18, 2010 at 11:07 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> On 08/18/2010 10:18 AM, ey-chih chow wrote:
>> Thanks. But by doing this way, what kind of advantage we can get from
> The Avro MapReduce API is easiest to use when both inputs and outputs are
> Avro data.
> If inputs are not Avro data, but you want to use the rest of the Avro MR
> API, then you'd need to write an InputFormat that produces an AvroWrapper<T>
> where T is a type that Avro can serialize.
> Another alternative might be to first convert your inputs to be avro data
> files. For example, one can use Avro's 'fromtext' tool to convert
> line-oriented files into equivalent compressed, splittable, Avro data files.
> This could be done as log files are loaded into HDFS, since this tool
> accepts Hadoop paths as output.
> We hope to add more such tools for such conversion/ingest, e.g.:
Offtopic, but is there any work being done on this already? I saw one
of them tagged with 'GSOC', so wish to know before I sink something
> We also expect that systems like Flume will produce Avro data files.