Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - flume.EventDeliveryException: Failed to send events


Copy link to this message
-
Re: flume.EventDeliveryException: Failed to send events
Denis Lowe 2013-02-06, 18:10
I noticed that the logs in the destination server were reporting dropped
connections from the upstream server:

NettyServer$NettyServerAvroHandler.handleUpstream:171)  - [id: 0x08e56d76,
/SOURCE_HOST:43599 :> /LOCAL_HOST:4003] DISCONNECTED
NettyServer$NettyServerAvroHandler.handleUpstream:171)  - [id: 0x08e56d76,
/SOURCE_HOST:43599 :> /LOCAL_HOST:4003] UNBOUND
NettyServer$NettyServerAvroHandler.handleUpstream:171)  - [id: 0x08e56d76,
/SOURCE_HOST:43599 :> /LOCAL_HOST:4003] CLOSED
NettyServer$NettyServerAvroHandler.channelClosed:209)  - Connection to
/SOURCE_HOST:43599 disconnected.

The other thing I observed is that these errors only ever occur from our EU
servers (connecting to our centralized downstream collectors in US West) -
We are running Flume in Amazon EC2

I can see in the log that the connection is restored quite quickly.

I gather that the network latency between Europe and US West is causing the
connection between the 2 servers to 'appear' lost, thus resulting in the
above errors?

Are there any recommended config settings to compensate for this?

On 6 February 2013 00:21, Juhani Connolly
<[EMAIL PROTECTED]>wrote:

>  Is there anything unusual in the logs for the destination(avroSource)
> server
>
> Since the error is happening in the AvroSink, no data is getting lost. The
> failed data will get rolled back, removal from the local channel is
> cancelled, and it will attempt to resend it.
>
>
> On 02/06/2013 03:23 PM, Denis Lowe wrote:
>
> We are running Flume-NG 1.3.1 and have noticed periodically the following
> ERROR occurring (a few times daily):
>
>  We are using the File Channel connecting to 2 downstream collector
> agents in 'round_robin' mode, using avro source/sinks.
>
>  We are using the config described below to deliver 5 different log types
> (to 5 different ports downstream) and have observed the below
> error occurring randomly across all the ports.
>
>  We tried doubling the connect-timeout to 40000 (from the default of
> 20000) with no success.
> The agent appears to recover and keep on processing data.
>
>  My question is:
> Has this data been lost? or will flume eventually retry until a
> successfull delivery has been made?
> Are there any other config changes I can make to prevent/reduce
> this occurring in the future?
>
>  05 Feb 2013 23:12:21,650 ERROR
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver
> event. Exception follows.
> org.apache.flume.EventDeliveryException: Failed to send events
>         at org.apache.flume.sink.AvroSink.process(AvroSink.java:325)
>         at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
> host: collector1, port: 4003 }: Failed to send batch
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
>         at org.apache.flume.sink.AvroSink.process(AvroSink.java:309)
>         ... 3 more
> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
> host: collector2, port: 4003 }: RPC request timed out
>         at
> org.apache.flume.api.NettyAvroRpcClient.waitForStatusOK(NettyAvroRpcClient.java:321)
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:295)
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
>         ... 4 more
> Caused by: java.util.concurrent.TimeoutException
>         at org.apache.avro.ipc.CallFuture.get(CallFuture.java:132)
>         at
> org.apache.flume.api.NettyAvroRpcClient.waitForStatusOK(NettyAvroRpcClient.java:310)
>         ... 6 more
>
>  Below is a snapshot the current config:
>
>  agent.sources.eventdata.command = tail -qn +0 -F
> /var/log/event-logs/live/eventdata.log