Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> how to specify MultipleOutputs, MultipleInputs in using Avro mapred API


+
ey-chih chow 2010-08-17, 01:14
+
ey-chih chow 2010-08-17, 01:22
+
ey-chih chow 2010-08-18, 13:15
+
Harsh J 2010-08-18, 14:09
+
ey-chih chow 2010-08-18, 17:18
+
Doug Cutting 2010-08-18, 17:37
Copy link to this message
-
Re: how to specify MultipleOutputs, MultipleInputs in using Avro mapred API
On Wed, Aug 18, 2010 at 11:07 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> On 08/18/2010 10:18 AM, ey-chih chow wrote:
>>
>> Thanks. But by doing this way, what kind of advantage we can get from
>> Avro?
>
> The Avro MapReduce API is easiest to use when both inputs and outputs are
> Avro data.
>
> If inputs are not Avro data, but you want to use the rest of the Avro MR
> API, then you'd need to write an InputFormat that produces an AvroWrapper<T>
> where T is a type that Avro can serialize.
>
> Another alternative might be to first convert your inputs to be avro data
> files.  For example, one can use Avro's 'fromtext' tool to convert
> line-oriented files into equivalent compressed, splittable, Avro data files.
>  This could be done as log files are loaded into HDFS, since this tool
> accepts Hadoop paths as output.
>
> We hope to add more such tools for such conversion/ingest, e.g.:
>
> https://issues.apache.org/jira/browse/AVRO-458
Offtopic, but is there any work being done on this already? I saw one
of them tagged with 'GSOC', so wish to know before I sink something
down.
>
> We also expect that systems like Flume will produce Avro data files.
>
> Doug
>

--
Harsh J
www.harshj.com
+
Doug Cutting 2010-08-18, 18:14
+
Harsh J 2010-08-18, 17:40
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB