Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Log Events get Lost - flume 1.3


+
Kumar, Deepak8 2013-04-16, 08:16
Copy link to this message
-
Re: Log Events get Lost - flume 1.3
Hi,

There are two issues with your configuration:

1) batch size of 1 with file channel is anti-pattern. This will result in
extremely poor performance because the file channel will have to do an
fsync() (expensive disk operation required to ensure no data loss) for each
event. Your batch size should probably be in the hundreds or thousands.

2) tail -F *will* lose data. There is a writeup on this in documentation.
If you care about your data, you will want to use Spooling Directory Source.

Issue #2 is being worsened by issue #1. Since you have such a low batch
size, throughput of the file channel is extremely low. As tail -F results
in no feedback to the tail process, more data than is being lost than would
otherwise be the case due to the low channel throughput.
Brock
On Tue, Apr 16, 2013 at 3:16 AM, Kumar, Deepak8 <[EMAIL PROTECTED]>wrote:

>  Hi,****
>
> I have 10 flume agents configured at a single machine. A single log file
> has frequency of 500 log events/sec. Hence in 10 log files the logs are
> getting generated as 5000 log events per second (5000/sec).****
>
> ** **
>
> If my channel capacity is 1 million,  more than 70% of log events is lost!
> If I increase the channel capacity to 50 millions, then flume agent takes
> more than 24 hours to transfer the log events from source to sink.****
>
> ** **
>
> The size of dataDir (agent.channels.fileChannel.dataDirs > /var/log/flume-ng/file-channel/data) is almost 2G all the time.****
>
> ** **
>
> Could you please guide me the optimum configuration so that I don't miss
> any of log events & the transfer is also good enough. My
> flume-conf.properties has following contents:****
>
> * *
>
> * *
>
> * *
>
> *agent.channels = fileChannel*
>
> *agent.sinks = avroSink*
>
> * *
>
> *# Each sink's type must be defined*
>
> *agent.sinks.avroSink.type = avro*
>
> *agent.sinks.avroSink.hostname = spnnq01.nam.nsroot.net*
>
> *agent.sinks.avroSink.port = 1442*
>
> *agent.sinks.avroSink.batchSize = 1000*
>
> * *
>
> *#Specify the channel the sink should use*
>
> *agent.sinks.avroSink.channel = fileChannel*
>
> * *
>
> * *
>
> *# Each channel's type is defined.*
>
> *agent.channels.fileChannel.type = file*
>
> *agent.channels.fileChannel.checkpointDir > /var/log/flume-ng/file-channel/checkpoint*
>
> *agent.channels.fileChannel.dataDirs = /var/log/flume-ng/file-channel/data
> *
>
> *agent.channels.fileChannel.transactionCapacity = 1000*
>
> *agent.channels.fileChannel.checkpointInterval = 30000*
>
> *agent.channels.fileChannel.maxFileSize = 2146435071*
>
> *agent.channels.fileChannel.minimumRequiredSpace = 524288000*
>
> *agent.channels.fileChannel.keep-alive = 5*
>
> *agent.channels.fileChannel.write-timeout = 10*
>
> *agent.channels.fileChannel.checkpoint-timeout = 600*
>
> *agent.channels.fileChannel.capacity = 50000000*
>
> *agent.sources.s2.batchSize = 1*
>
> *agent.sources.s2.channels = fileChannel*
>
> *agent.sources.s2.command = tail -F
> /var/log/creditcard/AggKeyListener.2.2013-01-19*
>
> *agent.sources.s2.interceptors = logIntercept*
>
> *agent.sources.s2.interceptors.logIntercept.appId = 153299*
>
> *agent.sources.s2.interceptors.logIntercept.env = SP*
>
> *agent.sources.s2.interceptors.logIntercept.hostName > vm-e61b-fe34.nam.nsroot.net*
>
> *agent.sources.s2.interceptors.logIntercept.logFileName > AggKeyListener.2.2013-01-19*
>
> *agent.sources.s2.interceptors.logIntercept.logFilePath > /var/log/creditcard/*
>
> *agent.sources.s2.interceptors.logIntercept.logType = creditcard log*
>
> *agent.sources.s2.interceptors.logIntercept.type > com.citi.sponge.flume.agent.source.LogInterceptor$Builder*
>
> *agent.sources.s2.type = exec*
>
> *agent.sources.s0.batchSize = 1*
>
> *agent.sources.s0.channels = fileChannel*
>
> *agent.sources.s0.command = tail -F
> /var/log/creditcard/AggKeyListener.0.2013-01-19*
>
> *agent.sources.s0.interceptors = logIntercept*
>
> *agent.sources.s0.interceptors.logIntercept.appId = 153299*
>
> *agent.sources.s0.interceptors.logIntercept.env = SP*

Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
+
Kumar, Deepak8 2013-04-16, 18:36
+
Israel Ekpo 2013-04-16, 19:02
+
Brock Noland 2013-04-16, 19:17
+
Kumar, Deepak8 2013-04-17, 14:22