Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> What does the file header mean ? Flume always add headers to file header


+
higkoohk 2013-05-15, 10:05
+
higkoohk 2013-05-15, 10:06
Copy link to this message
-
Re: What does the file header mean ? Flume always add headers to file header
You probably figured this out by now but those are Avro container files :)

see http://avro.apache.org

Regards
Mike

On Wed, May 15, 2013 at 3:06 AM, higkoohk <[EMAIL PROTECTED]> wrote:

> Maybe it make by 'tengine.sinks.hdfs4log.serializer = avro_event' , but
> still don't know why and howto ...
>
>
> 2013/5/15 higkoohk <[EMAIL PROTECTED]>
>
>> My flume.conf
>>
>> tengine.sources = tengine
>>> tengine.sources.tengine.type = exec
>>> tengine.sources.tengine.command = tail -n +0 -F
>>> /data/log/tengine/access.log
>>> tengine.sources.tengine.channels = file4log
>>> tengine.sinks = hdfs4log
>>> tengine.sinks.hdfs4log.type = hdfs
>>> tengine.sinks.hdfs4log.channel = file4log
>>> tengine.sinks.hdfs4log.serializer = avro_event
>>> tengine.sinks.hdfs4log.hdfs.path = hdfs://
>>> hdfs.kisops.org:8020/flume/tengine
>>> tengine.sinks.hdfs4log.hdfs.filePrefix = access
>>> tengine.sinks.hdfs4log.hdfs.fileSuffix = .log
>>> tengine.sinks.hdfs4log.hdfs.rollInterval = 0
>>> tengine.sinks.hdfs4log.hdfs.rollCount = 0
>>> tengine.sinks.hdfs4log.hdfs.rollSize = 134217728
>>> tengine.sinks.hdfs4log.hdfs.batchSize = 1024
>>> tengine.sinks.hdfs4log.hdfs.threadsPoolSize = 1
>>> tengine.sinks.hdfs4log.hdfs.fileType = DataStream
>>> tengine.sinks.hdfs4log.hdfs.writeFormat = Text
>>> tengine.channels = file4log
>>> tengine.channels.file4log.type = file
>>> tengine.channels.file4log.capacity = 4096
>>> tengine.channels.file4log.transactionCapacity = 1024
>>> tengine.channels.file4log.checkpointDir = /data/log/hdfs
>>> tengine.channels.file4log.dataDirs = /data/log/loadrunner
>>
>>
>> When I see the logs in hdfs , there are same headers in files which not
>> creater by app :
>>
>>> Obj avro.codec null avro.schema�
>>> {"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}�"
>>> �,�)��E����5�Y� ��
>>> �� agent25.kisops.org|10.20.216.20|1368610557.341|200|207|255|GET
>>> /status?00000005 HTTP/1.1|0.000|52033467��
>>
>>
>> See the image :
>>
>>
>> What does it mean , how to remove it or when and how to use this info ?
>>
>> Many thanks !
>>
>
>
+
higkoohk 2013-06-09, 07:05
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB