Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Flume-NG agent issue on daily rotation files


Copy link to this message
-
Flume-NG agent issue on daily rotation files
Hello All, I'm trying to load the app servers request logs in Hadoop hdfs.

I get all the consolidate logs in one file for a day. I'm running the flume
agent with following config:

##

agent.sources = apache

agent.sources.apache.type = exec

agent.sources.apache.command = cat

/appserverlogs/requestfile/request.log.2013_06_07

agent.sources.apache.batchSize = 1

agent.sources.apache.channels = memoryChannel
agent.sources.apache.interceptors = itime ihost itype #
http://flume.apache.org/FlumeUserGuide.html#timestamp-interceptor

agent.sources.apache.interceptors.itime.type = timestamp #
http://flume.apache.org/FlumeUserGuide.html#host-interceptor

agent.sources.apache.interceptors.ihost.type = host
agent.sources.apache.interceptors.ihost.useIP = false
agent.sources.apache.interceptors.ihost.hostHeader = host #
http://flume.apache.org/FlumeUserGuide.html#static-interceptor

agent.sources.apache.interceptors.itype.type = static
agent.sources.apache.interceptors.itype.key = log_type
agent.sources.apache.interceptors.itype.value = request_logs

# http://flume.apache.org/FlumeUserGuide.html#memory-channel

agent.channels = memoryChannel

agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 1000
agent.channels.memoryChannel.transactionCapacity = 100
agent.channels.memoryChannel.keep-alive = 3
agent.channels.memoryChannel.byteCapacityBufferPercentage = 20

## Send to Flume Collector on 1.2.3.4 (Hadoop Slave Node) #
http://flume.apache.org/FlumeUserGuide.html#avro-sink

agent.sinks = AvroSink

agent.sinks.AvroSink.type = avro

agent.sinks.AvroSink.channel = memoryChannel agent.sinks.AvroSink.hostname
= h1.vgs.mypoints.com agent.sinks.AvroSink.port = 4545

here you can see that I'm using the cat command with the specific file.

As I said that i get one file a day with the date in it.

Q: How could I mention in the config file to keep rotating the cat file
name in the above config for everyday new file? Currently once the file is
loaded then I've to stop the agent and change the config and run the agent
again.

On the Hadoop slave I've the collector running with the following config:

collector.sources = AvroIn

collector.sources.AvroIn.type = avro

collector.sources.AvroIn.bind = 0.0.0.0

collector.sources.AvroIn.port = 4545

collector.sources.AvroIn.channels = mc1 mc2

## Channels ########################################################

## Source writes to 2 channels, one for each sink (Fan Out)
collector.channels = mc1 mc2

collector.channels.mc1.type = memory

collector.channels.mc1.capacity = 1000

collector.channels.mc1.transactionCapacity = 100
collector.channels.mc1.keep-alive = 3
collector.channels.mc1.byteCapacityBufferPercentage = 20

collector.channels.mc2.type = memory

collector.channels.mc2.capacity = 1000

collector.channels.mc2.transactionCapacity = 100
collector.channels.mc2.keep-alive = 3
collector.channels.mc2.byteCapacityBufferPercentage = 20

## Sinks ###########################################################

collector.sinks = LocalOut HadoopOut

## Write copy to Local Filesystem (Debugging) #
http://flume.apache.org/FlumeUserGuide.html#file-roll-sink

collector.sinks.LocalOut.type = file_roll
collector.sinks.LocalOut.sink.directory = /var/log/flume
collector.sinks.LocalOut.sink.rollInterval = 0
collector.sinks.LocalOut.channel = mc1

## Write to HDFS

# http://flume.apache.org/FlumeUserGuide.html#hdfs-sink

collector.sinks.HadoopOut.type = hdfs

collector.sinks.HadoopOut.channel = mc2

collector.sinks.HadoopOut.hdfs.path
/user/flume/events/%{log_type}/%{host}/%y-%m-%d

collector.sinks.HadoopOut.hdfs.fileType = DataStream
collector.sinks.HadoopOut.hdfs.writeFormat = Text
collector.sinks.HadoopOut.hdfs.rollSize = 0
collector.sinks.HadoopOut.hdfs.rollCount = 0
collector.sinks.HadoopOut.hdfs.rollInterval = 0

Q: Collector is loading the file into hdfs as .tmp extention. Untill I kill
the collector it dont' rotate the file to normal name. I've played with

collector.sinks.HadoopOut.hdfs.rollSize = 0
collector.sinks.HadoopOut.hdfs.rollCount = 0
collector.sinks.HadoopOut.hdfs.rollInterval = 0

but then it  create many files.I'm looking for creating one file for one
day requestlogs.

I really appreciate any help on this issue.

-Sanjeev

Sanjeev Sagar

*"**Separate yourself from everything that separates you from others
!" - Nirankari
Baba Hardev Singh ji *

**
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB