Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> What does the file header mean ? Flume always add headers to file header


Copy link to this message
-
Re: What does the file header mean ? Flume always add headers to file header
You probably figured this out by now but those are Avro container files :)

see http://avro.apache.org

Regards
Mike

On Wed, May 15, 2013 at 3:06 AM, higkoohk <[EMAIL PROTECTED]> wrote:

> Maybe it make by 'tengine.sinks.hdfs4log.serializer = avro_event' , but
> still don't know why and howto ...
>
>
> 2013/5/15 higkoohk <[EMAIL PROTECTED]>
>
>> My flume.conf
>>
>> tengine.sources = tengine
>>> tengine.sources.tengine.type = exec
>>> tengine.sources.tengine.command = tail -n +0 -F
>>> /data/log/tengine/access.log
>>> tengine.sources.tengine.channels = file4log
>>> tengine.sinks = hdfs4log
>>> tengine.sinks.hdfs4log.type = hdfs
>>> tengine.sinks.hdfs4log.channel = file4log
>>> tengine.sinks.hdfs4log.serializer = avro_event
>>> tengine.sinks.hdfs4log.hdfs.path = hdfs://
>>> hdfs.kisops.org:8020/flume/tengine
>>> tengine.sinks.hdfs4log.hdfs.filePrefix = access
>>> tengine.sinks.hdfs4log.hdfs.fileSuffix = .log
>>> tengine.sinks.hdfs4log.hdfs.rollInterval = 0
>>> tengine.sinks.hdfs4log.hdfs.rollCount = 0
>>> tengine.sinks.hdfs4log.hdfs.rollSize = 134217728
>>> tengine.sinks.hdfs4log.hdfs.batchSize = 1024
>>> tengine.sinks.hdfs4log.hdfs.threadsPoolSize = 1
>>> tengine.sinks.hdfs4log.hdfs.fileType = DataStream
>>> tengine.sinks.hdfs4log.hdfs.writeFormat = Text
>>> tengine.channels = file4log
>>> tengine.channels.file4log.type = file
>>> tengine.channels.file4log.capacity = 4096
>>> tengine.channels.file4log.transactionCapacity = 1024
>>> tengine.channels.file4log.checkpointDir = /data/log/hdfs
>>> tengine.channels.file4log.dataDirs = /data/log/loadrunner
>>
>>
>> When I see the logs in hdfs , there are same headers in files which not
>> creater by app :
>>
>>> Obj avro.codec null avro.schema�
>>> {"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}�"
>>> �,�)��E����5�Y� ��
>>> �� agent25.kisops.org|10.20.216.20|1368610557.341|200|207|255|GET
>>> /status?00000005 HTTP/1.1|0.000|52033467��
>>
>>
>> See the image :
>>
>>
>> What does it mean , how to remove it or when and how to use this info ?
>>
>> Many thanks !
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB