Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> flume.EventDeliveryException: Failed to send events


Copy link to this message
-
Re: flume.EventDeliveryException: Failed to send events
I noticed that the logs in the destination server were reporting dropped
connections from the upstream server:

NettyServer$NettyServerAvroHandler.handleUpstream:171)  - [id: 0x08e56d76,
/SOURCE_HOST:43599 :> /LOCAL_HOST:4003] DISCONNECTED
NettyServer$NettyServerAvroHandler.handleUpstream:171)  - [id: 0x08e56d76,
/SOURCE_HOST:43599 :> /LOCAL_HOST:4003] UNBOUND
NettyServer$NettyServerAvroHandler.handleUpstream:171)  - [id: 0x08e56d76,
/SOURCE_HOST:43599 :> /LOCAL_HOST:4003] CLOSED
NettyServer$NettyServerAvroHandler.channelClosed:209)  - Connection to
/SOURCE_HOST:43599 disconnected.

The other thing I observed is that these errors only ever occur from our EU
servers (connecting to our centralized downstream collectors in US West) -
We are running Flume in Amazon EC2

I can see in the log that the connection is restored quite quickly.

I gather that the network latency between Europe and US West is causing the
connection between the 2 servers to 'appear' lost, thus resulting in the
above errors?

Are there any recommended config settings to compensate for this?

On 6 February 2013 00:21, Juhani Connolly
<[EMAIL PROTECTED]>wrote:

>  Is there anything unusual in the logs for the destination(avroSource)
> server
>
> Since the error is happening in the AvroSink, no data is getting lost. The
> failed data will get rolled back, removal from the local channel is
> cancelled, and it will attempt to resend it.
>
>
> On 02/06/2013 03:23 PM, Denis Lowe wrote:
>
> We are running Flume-NG 1.3.1 and have noticed periodically the following
> ERROR occurring (a few times daily):
>
>  We are using the File Channel connecting to 2 downstream collector
> agents in 'round_robin' mode, using avro source/sinks.
>
>  We are using the config described below to deliver 5 different log types
> (to 5 different ports downstream) and have observed the below
> error occurring randomly across all the ports.
>
>  We tried doubling the connect-timeout to 40000 (from the default of
> 20000) with no success.
> The agent appears to recover and keep on processing data.
>
>  My question is:
> Has this data been lost? or will flume eventually retry until a
> successfull delivery has been made?
> Are there any other config changes I can make to prevent/reduce
> this occurring in the future?
>
>  05 Feb 2013 23:12:21,650 ERROR
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver
> event. Exception follows.
> org.apache.flume.EventDeliveryException: Failed to send events
>         at org.apache.flume.sink.AvroSink.process(AvroSink.java:325)
>         at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
> host: collector1, port: 4003 }: Failed to send batch
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
>         at org.apache.flume.sink.AvroSink.process(AvroSink.java:309)
>         ... 3 more
> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
> host: collector2, port: 4003 }: RPC request timed out
>         at
> org.apache.flume.api.NettyAvroRpcClient.waitForStatusOK(NettyAvroRpcClient.java:321)
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:295)
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
>         ... 4 more
> Caused by: java.util.concurrent.TimeoutException
>         at org.apache.avro.ipc.CallFuture.get(CallFuture.java:132)
>         at
> org.apache.flume.api.NettyAvroRpcClient.waitForStatusOK(NettyAvroRpcClient.java:310)
>         ... 6 more
>
>  Below is a snapshot the current config:
>
>  agent.sources.eventdata.command = tail -qn +0 -F
> /var/log/event-logs/live/eventdata.log
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB