Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> flume.EventDeliveryException: Failed to send events


+
Denis Lowe 2013-02-06, 06:23
+
Juhani Connolly 2013-02-06, 08:21
+
Denis Lowe 2013-02-06, 18:10
+
Chris Neal 2013-04-16, 18:38
+
Hari Shreedharan 2013-04-16, 18:52
+
Chris Neal 2013-04-16, 19:07
+
Hari Shreedharan 2013-04-16, 19:19
+
Brock Noland 2013-04-16, 19:26
+
Chris Neal 2013-04-16, 19:49
Copy link to this message
-
Re: flume.EventDeliveryException: Failed to send events
Any chance of getting a thread dump (or a few) on both the source and
destination agent during an incident? :) It'd be a little work, but a
script could look for "Unable to deliver event. Exception follows." and do
the thread dumps.
On Tue, Apr 16, 2013 at 2:49 PM, Chris Neal <[EMAIL PROTECTED]> wrote:

> Thanks for all the input guys. :)
>
> @Hari:
> The FileChannel at the AvroSource is on a SAN disk, so I don't think it is
> the bottleneck in my case.  It is the same disk for both checkpoint and
> data.  My queue depth remains relatively stable around 2000, which doesn't
> bother me because of my batch size.
>
> On the AvroSink side, I have 126 ExecSources spread across 12 JVMs at 1GB
> heap each.  Each VM has 4 AvroSinks across 2 separate servers, load
> balanced and round robined (2 connections to each).  Is that a small enough
> number of connections to remove the thread parameter on the AvroSource?
>
> @Brock
> Each of my VMs on the downstream agents are 4GB heaps.  I watch the MBeans
> pretty closely via jconsole, and they sit around 3GB used.
>
> Thanks again for all the help!
>
>
> On Tue, Apr 16, 2013 at 2:26 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
>
>> Another possibility is that the downstream agent is near capacity from a
>> memory perspective. What is your heap size for these agents?
>>
>>
>> On Tue, Apr 16, 2013 at 2:19 PM, Hari Shreedharan <
>> [EMAIL PROTECTED]> wrote:
>>
>>>  One possibility is that you are hitting the file channel disk too hard
>>> - do you have just one disk for checkpoint and data? It might be getting
>>> slow because of this? Also you should probably just remove the thread limit
>>> on AvroSource. It usually does not cause too much havoc unless you have a
>>> massive number of connections causing too many threads.
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Tuesday, April 16, 2013 at 12:07 PM, Chris Neal wrote:
>>>
>>> Thanks Hari.
>>>
>>> I increased both the connect and request timeouts to 40000ms, and I'm
>>> testing that now.  I am talking on a LAN though, which is part of the
>>> reason I'm concerned.  Seems like it might not actually be a network issue,
>>> but perhaps an overloaded AvroSource on the back end?
>>>
>>>
>>> On Tue, Apr 16, 2013 at 1:52 PM, Hari Shreedharan <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>  Looks like you are hitting Avro IPC timeouts - you should probably
>>> increase it, especially if you are talking over WAN.
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Tuesday, April 16, 2013 at 11:38 AM, Chris Neal wrote:
>>>
>>> I'm seeing the same thing :)
>>>
>>> Mine is all on a local LAN though, so the fact that an RPC call doesn't
>>> reply in 10000ms or 20000ms is quite odd.  My configuration is for the most
>>> part the same as Denis' configuration.  Two tiered system, ExecSources
>>> running tail -F on log files to an AvroSink, to an AvroSource, loading into
>>> HDFS on the back tier.
>>>
>>> I, too, see this on the AvroSink
>>>
>>> Either (A):
>>> [2013-04-15 23:57:14.827]
>>> [org.apache.flume.sink.LoadBalancingSinkProcessor] [ WARN]
>>> [SinkRunner-PollingRunner-LoadBalancingSinkProcessor] []
>>>  (LoadBalancingSinkProcessor.java:process:154) Sink failed to consume
>>> event. Attempting next sink if available.
>>> org.apache.flume.EventDeliveryException: Failed to send events
>>>         at org.apache.flume.sink.AvroSink.process(AvroSink.java:324)
>>>         at
>>> org.apache.flume.sink.LoadBalancingSinkProcessor.process(LoadBalancingSinkProcessor.java:151)
>>>         at
>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>         at java.lang.Thread.run(Thread.java:619)
>>> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
>>> host: hadoopjt01.pegs.com, port: 10000 }: Failed to send batch
>>>         at
>>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
>>>         at org.apache.flume.sink.AvroSink.process(AvroSink.java:308)
>>>         ... 3 more
>>> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
+
Chris Neal 2013-04-16, 20:05