Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - how to specify MultipleOutputs, MultipleInputs in using Avro mapred API

ey-chih chow 2010-08-17, 01:14
ey-chih chow 2010-08-17, 01:22
ey-chih chow 2010-08-18, 13:15
Harsh J 2010-08-18, 14:09
ey-chih chow 2010-08-18, 17:18
Doug Cutting 2010-08-18, 17:37
Copy link to this message
Re: how to specify MultipleOutputs, MultipleInputs in using Avro mapred API
Harsh J 2010-08-18, 17:49
On Wed, Aug 18, 2010 at 11:07 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> On 08/18/2010 10:18 AM, ey-chih chow wrote:
>> Thanks. But by doing this way, what kind of advantage we can get from
>> Avro?
> The Avro MapReduce API is easiest to use when both inputs and outputs are
> Avro data.
> If inputs are not Avro data, but you want to use the rest of the Avro MR
> API, then you'd need to write an InputFormat that produces an AvroWrapper<T>
> where T is a type that Avro can serialize.
> Another alternative might be to first convert your inputs to be avro data
> files.  For example, one can use Avro's 'fromtext' tool to convert
> line-oriented files into equivalent compressed, splittable, Avro data files.
>  This could be done as log files are loaded into HDFS, since this tool
> accepts Hadoop paths as output.
> We hope to add more such tools for such conversion/ingest, e.g.:
> https://issues.apache.org/jira/browse/AVRO-458
Offtopic, but is there any work being done on this already? I saw one
of them tagged with 'GSOC', so wish to know before I sink something
> We also expect that systems like Flume will produce Avro data files.
> Doug

Harsh J
Doug Cutting 2010-08-18, 18:14
Harsh J 2010-08-18, 17:40