Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> flume.EventDeliveryException: Failed to send events


Copy link to this message
-
Re: flume.EventDeliveryException: Failed to send events
Any chance of getting a thread dump (or a few) on both the source and
destination agent during an incident? :) It'd be a little work, but a
script could look for "Unable to deliver event. Exception follows." and do
the thread dumps.
On Tue, Apr 16, 2013 at 2:49 PM, Chris Neal <[EMAIL PROTECTED]> wrote:

> Thanks for all the input guys. :)
>
> @Hari:
> The FileChannel at the AvroSource is on a SAN disk, so I don't think it is
> the bottleneck in my case.  It is the same disk for both checkpoint and
> data.  My queue depth remains relatively stable around 2000, which doesn't
> bother me because of my batch size.
>
> On the AvroSink side, I have 126 ExecSources spread across 12 JVMs at 1GB
> heap each.  Each VM has 4 AvroSinks across 2 separate servers, load
> balanced and round robined (2 connections to each).  Is that a small enough
> number of connections to remove the thread parameter on the AvroSource?
>
> @Brock
> Each of my VMs on the downstream agents are 4GB heaps.  I watch the MBeans
> pretty closely via jconsole, and they sit around 3GB used.
>
> Thanks again for all the help!
>
>
> On Tue, Apr 16, 2013 at 2:26 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
>
>> Another possibility is that the downstream agent is near capacity from a
>> memory perspective. What is your heap size for these agents?
>>
>>
>> On Tue, Apr 16, 2013 at 2:19 PM, Hari Shreedharan <
>> [EMAIL PROTECTED]> wrote:
>>
>>>  One possibility is that you are hitting the file channel disk too hard
>>> - do you have just one disk for checkpoint and data? It might be getting
>>> slow because of this? Also you should probably just remove the thread limit
>>> on AvroSource. It usually does not cause too much havoc unless you have a
>>> massive number of connections causing too many threads.
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Tuesday, April 16, 2013 at 12:07 PM, Chris Neal wrote:
>>>
>>> Thanks Hari.
>>>
>>> I increased both the connect and request timeouts to 40000ms, and I'm
>>> testing that now.  I am talking on a LAN though, which is part of the
>>> reason I'm concerned.  Seems like it might not actually be a network issue,
>>> but perhaps an overloaded AvroSource on the back end?
>>>
>>>
>>> On Tue, Apr 16, 2013 at 1:52 PM, Hari Shreedharan <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>  Looks like you are hitting Avro IPC timeouts - you should probably
>>> increase it, especially if you are talking over WAN.
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Tuesday, April 16, 2013 at 11:38 AM, Chris Neal wrote:
>>>
>>> I'm seeing the same thing :)
>>>
>>> Mine is all on a local LAN though, so the fact that an RPC call doesn't
>>> reply in 10000ms or 20000ms is quite odd.  My configuration is for the most
>>> part the same as Denis' configuration.  Two tiered system, ExecSources
>>> running tail -F on log files to an AvroSink, to an AvroSource, loading into
>>> HDFS on the back tier.
>>>
>>> I, too, see this on the AvroSink
>>>
>>> Either (A):
>>> [2013-04-15 23:57:14.827]
>>> [org.apache.flume.sink.LoadBalancingSinkProcessor] [ WARN]
>>> [SinkRunner-PollingRunner-LoadBalancingSinkProcessor] []
>>>  (LoadBalancingSinkProcessor.java:process:154) Sink failed to consume
>>> event. Attempting next sink if available.
>>> org.apache.flume.EventDeliveryException: Failed to send events
>>>         at org.apache.flume.sink.AvroSink.process(AvroSink.java:324)
>>>         at
>>> org.apache.flume.sink.LoadBalancingSinkProcessor.process(LoadBalancingSinkProcessor.java:151)
>>>         at
>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>         at java.lang.Thread.run(Thread.java:619)
>>> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
>>> host: hadoopjt01.pegs.com, port: 10000 }: Failed to send batch
>>>         at
>>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
>>>         at org.apache.flume.sink.AvroSink.process(AvroSink.java:308)
>>>         ... 3 more
>>> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB