Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Flume 1.3.0 + HDFS Sink + S3N + avro_vent + Hive…?


Copy link to this message
-
Flume 1.3.0 + HDFS Sink + S3N + avro_vent + Hive…?
Matt Wise 2013-05-08, 17:42
We're still working on getting our POC of Flume up and running... right now we have log events that pass through our Flume nodes via a Syslog input and are happily sent off to ElasticSearch for indexing. We're also sending these events to S3, but we're finding that they seem to be unreadable with the avro tools.

> # S3 Output Sink
> agent.sinks.s3.type = hdfs
> agent.sinks.s3.channel = fc1
> agent.sinks.s3.hdfs.path = s3n://XXX:XXX@our_bucket/flume/events/%y-%m-%d/%H
> agent.sinks.s3.hdfs.rollInterval = 600
> agent.sinks.s3.hdfs.rollSize = 0
> agent.sinks.s3.hdfs.rollCount = 10000
> agent.sinks.s3.hdfs.batchSize = 10000
> agent.sinks.s3.hdfs.serializer = avro_event
> agent.sinks.s3.hdfs.fileType = SequenceFile
> agent.sinks.s3.hdfs.timeZone = UTC
When we try to look at the avro-serialized files, we get this error:

> [localhost avro]$ java -jar avro-tools-1.7.4.jar getschema FlumeData.1367857371493
> Exception in thread "main" java.io.IOException: Not a data file.
>         at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
>         at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
>         at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:89)
>         at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:48)
>         at org.apache.avro.tool.Main.run(Main.java:80)
>         at org.apache.avro.tool.Main.main(Main.java:69)

At this point we're a bit unclear how we're supposed to use these FlumeData files with normal Avro tools?

--Matt
+
Eric Sammer 2013-05-08, 20:12
+
Matt Wise 2013-05-08, 20:21