Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Converting text to avro in Flume


Copy link to this message
-
Re: Converting text to avro in Flume
There was a  mistake in my configuration.  I had hdfs infront of
serializer.
Changed
 tier1.sinks.sink1.hdfs.serializer =  avro_event

to  tier1.sinks.sink1.serializer =  avro_event

But it is still generating a sequence file. This is what I get.

SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.TextK???2-%??-/??
A??,? ?<message>xmldata</message>
On Fri, Oct 4, 2013 at 10:43 PM, Deepak Subhramanian <
[EMAIL PROTECTED]> wrote:

> Thanks Hari.
>
> I speficied the fileType.  This is what I have. I will try again and let
> you know.
>
> tier1.sources  = httpsrc1
> tier1.channels = c1
> tier1.sinks    = sink1
>
> tier1.sources.httpsrc1.bind     = 127.0.0.1
> tier1.sources.httpsrc1.type = http
> tier1.sources.httpsrc1.port = 9999
> tier1.sources.httpsrc1.channels = c1
> tier1.sources.httpsrc1.handler = spikes.flume.XMLHandler
> tier1.sources.httpsrc1.handler.nickname = HTTPTesting
>
> tier1.channels.c1.type   = memory
> tier1.channels.c1.capacity = 100
> #tier1.sinks.sink1.type         = logger
> tier1.sinks.sink1.channel      = c1
>
>
>  tier1.sinks.sink1.type = hdfs
>
> tier1.sinks.sink1.hdfs.path = /tmp/flumecollector
> tier1.sinks.sink1.hdfs.filePrefix = access_log
> tier1.sinks.sink1.hdfs.fileSuffix = .avro
> tier1.sinks.sink1.hdfs.fileType = DataStream
> tier1.sinks.sink1.hdfs.serializer =  avro_event
>
> I also added this later.
> tier1.sinks.sink1.hdfs.serializer.appendNewline = true
> tier1.sinks.sink1.hdfs.serializer.compressionCodec = snappy
>
>
>
> On Fri, Oct 4, 2013 at 4:56 PM, Hari Shreedharan <
> [EMAIL PROTECTED]> wrote:
>
>>  The default data type for HDFS Sink is Sequence file. Set the
>> hdfs.fileType to DataStream. See details here:
>> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
>>
>>
>> Thanks,
>> Hari
>>
>> On Friday, October 4, 2013 at 6:52 AM, Deepak Subhramanian wrote:
>>
>> I tried using the HDFS Sink to generate the avro file by using the
>> serializer as avro_event. But it is not generating avro file. But a
>> sequence file. Is it not suppose to generate a avro file with default
>> schema ?  Or do I have to generate the avro data from text in my
>> HTTPHandler source ?
>>
>>  "{ \"type\":\"record\", \"name\": \"Event\", \"fields\": [" +
>>
>>       " {\"name\": \"headers\", \"type\": { \"type\": \"map\",
>> \"values\": \"string\" } }, " +
>>       " {\"name\": \"body\", \"type\": \"bytes\" } ] }");
>>
>>
>> On Thu, Oct 3, 2013 at 3:36 PM, Deepak Subhramanian <
>> [EMAIL PROTECTED]> wrote:
>>
>> Hi ,
>>
>> I want to convert xml files in text to an avro file and store it in hdfs
>> . I get the xml files as a post request. I extended the  HTTPHandler to
>> process the XML post request. Do I have to convert the data in text to avro
>> in HTTPHandler or does the Avro Sink or HDFSSink convert it directly to
>> avro with some configuration details. I want to store the entire xml string
>> in an avro variable.
>>
>> Thanks in advance for any inputs.
>> Deepak Subhramanian
>>
>>
>>
>>
>> --
>> Deepak Subhramanian
>>
>>
>>
>
>
> --
> Deepak Subhramanian
>

--
Deepak Subhramanian