Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Re: FLUME AVRO


Copy link to this message
-
Re: FLUME AVRO
Mohit,
For historical reasons, the default fileType for HDFS sink is SequenceFile.

If you want Avro container format, then you must use fileType = DataStream
and use an event serializer that supports Avro, such as AVRO_EVENT.

See the user guide for the HDFS sink config options:
http://flume.apache.org/FlumeUserGuide.html#hdfs-sink

BTW, the AVRO_EVENT event serializer has some of its own options to control
compression, sync interval, etc. which are unfortunately not documented but
you can find them in this file:
https://git-wip-us.apache.org/repos/asf?p=flume.git;a=blob;f=flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventSerializerConfigurationConstants.java;h=cce67166f270bc7e4134f4aa577e1a01e88d409d;hb=trunk

They are syncIntervalBytes and compressionCodec

Regards,
Mike

On Sun, Aug 12, 2012 at 5:34 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

>
>
> On Sun, Aug 12, 2012 at 5:29 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>
>> Assuming you are writing to a file or HDFS then look at the
>> EventSerializer interface - there is an abstract class that implements that
>> interface which you can use for writing Avro.
>>
>>
>> https://cwiki.apache.org/confluence/display/FLUME/Flume+1.x+Event+Serializers
>>
>> http://flume.apache.org/releases/content/1.2.0/apidocs/org/apache/flume/serialization/EventSerializer.html
>>
>> This is an out-of-the-box Avro serializer that ships with Flume (its
>> alias is AVRO_EVENT):
>>
>> http://flume.apache.org/releases/content/1.2.0/apidocs/org/apache/flume/serialization/FlumeEventAvroEventSerializer.html
>>
>> If you want to use your own Avro schema then you can just implement this
>> abstract class and override the convert() method:
>>
>> http://flume.apache.org/releases/content/1.2.0/apidocs/org/apache/flume/serialization/AbstractAvroEventSerializer.html
>>
>>
>
> Is the data that is writen in the hdfs file in avro format and uses avro
> datafile? From what I understand data that is writen in HDFS is not in Avro
> format and goes in the sequence file.
>
>> Regards,
>> Mike
>>
>>
>> On Sun, Aug 12, 2012 at 6:15 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>>> Abhishek,
>>>
>>> Moving this to user@flume lists, as it is Flume specific.
>>>
>>> P.s. Please do not cross post to multiple lists, it does not guarantee
>>> you a faster response nor is mailing to a *-dev list relevant to your
>>> question here. Help avoid additional inbox noise! :)
>>>
>>> On Thu, Aug 9, 2012 at 10:43 PM, abhiTowson cal
>>> <[EMAIL PROTECTED]> wrote:
>>> > hi all,
>>> >
>>> > can log data be converted into avro,when data is sent from source to
>>> sink.
>>> >
>>> > Regards
>>> > Abhishek
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB