Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> flume.EventDeliveryException: Failed to send events


Copy link to this message
-
flume.EventDeliveryException: Failed to send events
We are running Flume-NG 1.3.1 and have noticed periodically the following
ERROR occurring (a few times daily):

We are using the File Channel connecting to 2 downstream collector agents
in 'round_robin' mode, using avro source/sinks.

We are using the config described below to deliver 5 different log types
(to 5 different ports downstream) and have observed the below
error occurring randomly across all the ports.

We tried doubling the connect-timeout to 40000 (from the default of 20000)
with no success.
The agent appears to recover and keep on processing data.

My question is:
Has this data been lost? or will flume eventually retry until a successfull
delivery has been made?
Are there any other config changes I can make to prevent/reduce
this occurring in the future?

05 Feb 2013 23:12:21,650 ERROR
[SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver
event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to send events
        at org.apache.flume.sink.AvroSink.process(AvroSink.java:325)
        at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
host: collector1, port: 4003 }: Failed to send batch
        at
org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
        at org.apache.flume.sink.AvroSink.process(AvroSink.java:309)
        ... 3 more
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
host: collector2, port: 4003 }: RPC request timed out
        at
org.apache.flume.api.NettyAvroRpcClient.waitForStatusOK(NettyAvroRpcClient.java:321)
        at
org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:295)
        at
org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
        ... 4 more
Caused by: java.util.concurrent.TimeoutException
        at org.apache.avro.ipc.CallFuture.get(CallFuture.java:132)
        at
org.apache.flume.api.NettyAvroRpcClient.waitForStatusOK(NettyAvroRpcClient.java:310)
        ... 6 more

Below is a snapshot the current config:

agent.sources.eventdata.command = tail -qn +0 -F
/var/log/event-logs/live/eventdata.log
agent.sources.eventdata.channels = eventdata-avro-channel
agent.sources.eventdata.batchSize = 10
agent.sources.eventdata.restart = true

## event source interceptor
agent.sources.eventdata.interceptors.host-interceptor.type org.apache.flume.interceptor.HostInterceptor$Builder
agent.sources.eventdata.interceptors.host-interceptor.hostHeader source-host
agent.sources.eventdata.interceptors.host-interceptor.useIP = false

## eventdata channel
agent.channels.eventdata-avro-channel.type = file
agent.channels.eventdata-avro-channel.checkpointDir /mnt/flume-ng/checkpoint/eventdata-avro-channel
agent.channels.eventdata-avro-channel.dataDirs /mnt/flume-ng/data1/eventdata-avro-channel,/mnt/flume-ng/data2/eventdata-avro-channel
agent.channels.eventdata-avro-channel.maxFileSize = 210000000
agent.channels.eventdata-avro-channel.capacity = 2000000
agent.channels.eventdata-avro-channel.checkpointInterval = 300000
agent.channels.eventdata-avro-channel.transactionCapacity = 10000
agent.channels.eventdata-avro-channel.keep-alive = 20
agent.channels.eventdata-avro-channel.write-timeout = 20

## 2 x downstream click sinks for load balancing and failover
agent.sinks.eventdata-avro-sink-1.type = avro
agent.sinks.eventdata-avro-sink-1.channel = eventdata-avro-channel
agent.sinks.eventdata-avro-sink-1.hostname = collector1
agent.sinks.eventdata-avro-sink-1.port = 4003
agent.sinks.eventdata-avro-sink-1.batch-size = 100
agent.sinks.eventdata-avro-sink-1.connect-timeout = 40000

agent.sinks.eventdata-avro-sink-2.type = avro
agent.sinks.eventdata-avro-sink-2.channel = eventdata-avro-channel
agent.sinks.eventdata-avro-sink-2.hostname = collector2
agent.sinks.eventdata-avro-sink-2.port = 4003
agent.sinks.eventdata-avro-sink-2.batch-size = 100
agent.sinks.eventdata-avro-sink-2.connect-timeout = 40000

## load balance config
agent.sinkgroups = eventdata-avro-sink-group
agent.sinkgroups.eventdata-avro-sink-group.sinks = eventdata-avro-sink-1
eventdata-avro-sink-2
agent.sinkgroups.eventdata-avro-sink-group.processor.type = load_balance
agent.sinkgroups.eventdata-avro-sink-group.processor.selector = round_robin

Denis.
+
Juhani Connolly 2013-02-06, 08:21
+
Denis Lowe 2013-02-06, 18:10
+
Chris Neal 2013-04-16, 18:38
+
Hari Shreedharan 2013-04-16, 18:52
+
Chris Neal 2013-04-16, 19:07
+
Hari Shreedharan 2013-04-16, 19:19
+
Brock Noland 2013-04-16, 19:26
+
Chris Neal 2013-04-16, 19:49
+
Brock Noland 2013-04-16, 19:57
+
Chris Neal 2013-04-16, 20:05