Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Re: FLUME AVRO


Copy link to this message
-
Re: FLUME AVRO
Mohit,
For historical reasons, the default fileType for HDFS sink is SequenceFile.

If you want Avro container format, then you must use fileType = DataStream
and use an event serializer that supports Avro, such as AVRO_EVENT.

See the user guide for the HDFS sink config options:
http://flume.apache.org/FlumeUserGuide.html#hdfs-sink

BTW, the AVRO_EVENT event serializer has some of its own options to control
compression, sync interval, etc. which are unfortunately not documented but
you can find them in this file:
https://git-wip-us.apache.org/repos/asf?p=flume.git;a=blob;f=flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventSerializerConfigurationConstants.java;h=cce67166f270bc7e4134f4aa577e1a01e88d409d;hb=trunk

They are syncIntervalBytes and compressionCodec

Regards,
Mike

On Sun, Aug 12, 2012 at 5:34 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

>
>
> On Sun, Aug 12, 2012 at 5:29 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>
>> Assuming you are writing to a file or HDFS then look at the
>> EventSerializer interface - there is an abstract class that implements that
>> interface which you can use for writing Avro.
>>
>>
>> https://cwiki.apache.org/confluence/display/FLUME/Flume+1.x+Event+Serializers
>>
>> http://flume.apache.org/releases/content/1.2.0/apidocs/org/apache/flume/serialization/EventSerializer.html
>>
>> This is an out-of-the-box Avro serializer that ships with Flume (its
>> alias is AVRO_EVENT):
>>
>> http://flume.apache.org/releases/content/1.2.0/apidocs/org/apache/flume/serialization/FlumeEventAvroEventSerializer.html
>>
>> If you want to use your own Avro schema then you can just implement this
>> abstract class and override the convert() method:
>>
>> http://flume.apache.org/releases/content/1.2.0/apidocs/org/apache/flume/serialization/AbstractAvroEventSerializer.html
>>
>>
>
> Is the data that is writen in the hdfs file in avro format and uses avro
> datafile? From what I understand data that is writen in HDFS is not in Avro
> format and goes in the sequence file.
>
>> Regards,
>> Mike
>>
>>
>> On Sun, Aug 12, 2012 at 6:15 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>>> Abhishek,
>>>
>>> Moving this to user@flume lists, as it is Flume specific.
>>>
>>> P.s. Please do not cross post to multiple lists, it does not guarantee
>>> you a faster response nor is mailing to a *-dev list relevant to your
>>> question here. Help avoid additional inbox noise! :)
>>>
>>> On Thu, Aug 9, 2012 at 10:43 PM, abhiTowson cal
>>> <[EMAIL PROTECTED]> wrote:
>>> > hi all,
>>> >
>>> > can log data be converted into avro,when data is sent from source to
>>> sink.
>>> >
>>> > Regards
>>> > Abhishek
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>