Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Converting text to avro in Flume


+
Deepak Subhramanian 2013-10-03, 14:36
+
Deepak Subhramanian 2013-10-04, 13:52
+
Hari Shreedharan 2013-10-04, 15:56
Copy link to this message
-
Re: Converting text to avro in Flume
Deepak Subhramanian 2013-10-04, 21:43
Thanks Hari.

I speficied the fileType.  This is what I have. I will try again and let
you know.

tier1.sources  = httpsrc1
tier1.channels = c1
tier1.sinks    = sink1

tier1.sources.httpsrc1.bind     = 127.0.0.1
tier1.sources.httpsrc1.type = http
tier1.sources.httpsrc1.port = 9999
tier1.sources.httpsrc1.channels = c1
tier1.sources.httpsrc1.handler = spikes.flume.XMLHandler
tier1.sources.httpsrc1.handler.nickname = HTTPTesting

tier1.channels.c1.type   = memory
tier1.channels.c1.capacity = 100
#tier1.sinks.sink1.type         = logger
tier1.sinks.sink1.channel      = c1
 tier1.sinks.sink1.type = hdfs

tier1.sinks.sink1.hdfs.path = /tmp/flumecollector
tier1.sinks.sink1.hdfs.filePrefix = access_log
tier1.sinks.sink1.hdfs.fileSuffix = .avro
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.hdfs.serializer =  avro_event

I also added this later.
tier1.sinks.sink1.hdfs.serializer.appendNewline = true
tier1.sinks.sink1.hdfs.serializer.compressionCodec = snappy

On Fri, Oct 4, 2013 at 4:56 PM, Hari Shreedharan
<[EMAIL PROTECTED]>wrote:

>  The default data type for HDFS Sink is Sequence file. Set the
> hdfs.fileType to DataStream. See details here:
> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
>
>
> Thanks,
> Hari
>
> On Friday, October 4, 2013 at 6:52 AM, Deepak Subhramanian wrote:
>
> I tried using the HDFS Sink to generate the avro file by using the
> serializer as avro_event. But it is not generating avro file. But a
> sequence file. Is it not suppose to generate a avro file with default
> schema ?  Or do I have to generate the avro data from text in my
> HTTPHandler source ?
>
>  "{ \"type\":\"record\", \"name\": \"Event\", \"fields\": [" +
>
>       " {\"name\": \"headers\", \"type\": { \"type\": \"map\",
> \"values\": \"string\" } }, " +
>       " {\"name\": \"body\", \"type\": \"bytes\" } ] }");
>
>
> On Thu, Oct 3, 2013 at 3:36 PM, Deepak Subhramanian <
> [EMAIL PROTECTED]> wrote:
>
> Hi ,
>
> I want to convert xml files in text to an avro file and store it in hdfs .
> I get the xml files as a post request. I extended the  HTTPHandler to
> process the XML post request. Do I have to convert the data in text to avro
> in HTTPHandler or does the Avro Sink or HDFSSink convert it directly to
> avro with some configuration details. I want to store the entire xml string
> in an avro variable.
>
> Thanks in advance for any inputs.
> Deepak Subhramanian
>
>
>
>
> --
> Deepak Subhramanian
>
>
>
--
Deepak Subhramanian
+
Deepak Subhramanian 2013-10-06, 22:27
+
Deepak Subhramanian 2013-10-06, 22:38
+
Hari Shreedharan 2013-10-06, 23:23
+
Deepak Subhramanian 2013-10-07, 10:56