Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> Custom Avro serializer


+
Andrew Stevenson 2013-08-12, 10:43
Copy link to this message
-
Re: Custom Avro serializer
The Apache Avro library provides a Schema class to let you construct
your own schema at runtime:
http://avro.apache.org/docs/current/api/java/org/apache/avro/Schema.html.
An example can be seen in this test case of Apache Avro:
https://github.com/apache/avro/blob/release-1.7.5/lang/java/avro/src/test/java/org/apache/avro/generic/TestGenericData.java#L90

Note that a single Avro data file cannot carry more than one type of
schema, unless the schema are all part of a single union. I don't know
as much about Flume to know if you can separate events out to
different files, but it seems logical that it should have such a
feature already, and that'd probably be the solution instead of
union-ing the schemas together.

On Mon, Aug 12, 2013 at 4:13 PM, Andrew Stevenson
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> My  Java experience is limited so maybe you guys can help. What I would like to do is have a custom Avro serializer that generates avro from a schema supplied in the header and fields in the body. Is this possible?
>
> I have seen the following examples but they all have fixed schemas or determine the schemas from predefined classes
>
> https://github.com/brockn/avro-flume-hive-example/blob/master/src/main/java/com/cloudera/flume/serialization/FlumeEventStringBodyAvroEventSerializer.java
> https://github.com/mpercy/flume-rtq-hadoop-summit-2013/blob/master/serializer/src/main/java/com/cloudera/flume/demo/CSVAvroSerializer.java
>
> I would like to use flume to stream in event from my production systems, ideally each event would end up at a sink that routes it to the correct HDFS folder for querying by HIVE later. I could receive multiple events i.e. product, trades, etc.
>
>
> Regards
>
> Andrew Stevenson
> Data Warehouse & Business Intelligence
>
> IMC financial markets | Strawinskylaan 377, WTC B-tower, 1077 XX Amsterdam | www.imc.nl <http://www.imc.nl/>
> P +31 (0)20 795 6103 | E [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
>
>
> ________________________________
>
> The information in this e-mail is intended only for the person or entity to which it is addressed.
>
> It may contain confidential and /or privileged material. If someone other than the intended recipient should receive this e-mail, he / she shall not be entitled to read, disseminate, disclose or duplicate it.
>
> If you receive this e-mail unintentionally, please inform us immediately by "reply" and then delete it from your system. Although this information has been compiled with great care, neither IMC Financial Markets & Asset Management nor any of its related entities shall accept any responsibility for any errors, omissions or other inaccuracies in this information or for the consequences thereof, nor shall it be bound in any way by the contents of this e-mail or its attachments. In the event of incomplete or incorrect transmission, please return the e-mail to the sender and permanently delete this message and any attachments.
>
> Messages and attachments are scanned for all known viruses. Always scan attachments before opening them.

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB