Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> how to specify MultipleOutputs, MultipleInputs in using Avro mapred API


Copy link to this message
-
Re: how to specify MultipleOutputs, MultipleInputs in using Avro mapred API
On Wed, Aug 18, 2010 at 11:07 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> On 08/18/2010 10:18 AM, ey-chih chow wrote:
>>
>> Thanks. But by doing this way, what kind of advantage we can get from
>> Avro?
>
> The Avro MapReduce API is easiest to use when both inputs and outputs are
> Avro data.
>
> If inputs are not Avro data, but you want to use the rest of the Avro MR
> API, then you'd need to write an InputFormat that produces an AvroWrapper<T>
> where T is a type that Avro can serialize.
>
> Another alternative might be to first convert your inputs to be avro data
> files.  For example, one can use Avro's 'fromtext' tool to convert
> line-oriented files into equivalent compressed, splittable, Avro data files.
>  This could be done as log files are loaded into HDFS, since this tool
> accepts Hadoop paths as output.
>
> We hope to add more such tools for such conversion/ingest, e.g.:
>
> https://issues.apache.org/jira/browse/AVRO-458
Offtopic, but is there any work being done on this already? I saw one
of them tagged with 'GSOC', so wish to know before I sink something
down.
>
> We also expect that systems like Flume will produce Avro data files.
>
> Doug
>

--
Harsh J
www.harshj.com