Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Converting text to avro in Flume


+
Deepak Subhramanian 2013-10-03, 14:36
+
Deepak Subhramanian 2013-10-04, 13:52
+
Hari Shreedharan 2013-10-04, 15:56
Copy link to this message
-
Re: Converting text to avro in Flume
Thanks Hari.

I speficied the fileType.  This is what I have. I will try again and let
you know.

tier1.sources  = httpsrc1
tier1.channels = c1
tier1.sinks    = sink1

tier1.sources.httpsrc1.bind     = 127.0.0.1
tier1.sources.httpsrc1.type = http
tier1.sources.httpsrc1.port = 9999
tier1.sources.httpsrc1.channels = c1
tier1.sources.httpsrc1.handler = spikes.flume.XMLHandler
tier1.sources.httpsrc1.handler.nickname = HTTPTesting

tier1.channels.c1.type   = memory
tier1.channels.c1.capacity = 100
#tier1.sinks.sink1.type         = logger
tier1.sinks.sink1.channel      = c1
 tier1.sinks.sink1.type = hdfs

tier1.sinks.sink1.hdfs.path = /tmp/flumecollector
tier1.sinks.sink1.hdfs.filePrefix = access_log
tier1.sinks.sink1.hdfs.fileSuffix = .avro
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.hdfs.serializer =  avro_event

I also added this later.
tier1.sinks.sink1.hdfs.serializer.appendNewline = true
tier1.sinks.sink1.hdfs.serializer.compressionCodec = snappy

On Fri, Oct 4, 2013 at 4:56 PM, Hari Shreedharan
<[EMAIL PROTECTED]>wrote:

>  The default data type for HDFS Sink is Sequence file. Set the
> hdfs.fileType to DataStream. See details here:
> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
>
>
> Thanks,
> Hari
>
> On Friday, October 4, 2013 at 6:52 AM, Deepak Subhramanian wrote:
>
> I tried using the HDFS Sink to generate the avro file by using the
> serializer as avro_event. But it is not generating avro file. But a
> sequence file. Is it not suppose to generate a avro file with default
> schema ?  Or do I have to generate the avro data from text in my
> HTTPHandler source ?
>
>  "{ \"type\":\"record\", \"name\": \"Event\", \"fields\": [" +
>
>       " {\"name\": \"headers\", \"type\": { \"type\": \"map\",
> \"values\": \"string\" } }, " +
>       " {\"name\": \"body\", \"type\": \"bytes\" } ] }");
>
>
> On Thu, Oct 3, 2013 at 3:36 PM, Deepak Subhramanian <
> [EMAIL PROTECTED]> wrote:
>
> Hi ,
>
> I want to convert xml files in text to an avro file and store it in hdfs .
> I get the xml files as a post request. I extended the  HTTPHandler to
> process the XML post request. Do I have to convert the data in text to avro
> in HTTPHandler or does the Avro Sink or HDFSSink convert it directly to
> avro with some configuration details. I want to store the entire xml string
> in an avro variable.
>
> Thanks in advance for any inputs.
> Deepak Subhramanian
>
>
>
>
> --
> Deepak Subhramanian
>
>
>
--
Deepak Subhramanian
+
Deepak Subhramanian 2013-10-06, 22:27
+
Deepak Subhramanian 2013-10-06, 22:38
+
Hari Shreedharan 2013-10-06, 23:23
+
Deepak Subhramanian 2013-10-07, 10:56
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB