Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> flume.EventDeliveryException: Failed to send events


Copy link to this message
-
Re: flume.EventDeliveryException: Failed to send events
Yeah, I can script something up.  Maybe increasing the timeouts was all it
needed and you'll never see another reply to this thread... ;)  But, if
not, I'll post the thread dump next.

Thanks again for all the suggestions.
Chris
On Tue, Apr 16, 2013 at 2:57 PM, Brock Noland <[EMAIL PROTECTED]> wrote:

> Any chance of getting a thread dump (or a few) on both the source and
> destination agent during an incident? :) It'd be a little work, but a
> script could look for "Unable to deliver event. Exception follows." and
> do the thread dumps.
>
>
> On Tue, Apr 16, 2013 at 2:49 PM, Chris Neal <[EMAIL PROTECTED]> wrote:
>
>> Thanks for all the input guys. :)
>>
>> @Hari:
>> The FileChannel at the AvroSource is on a SAN disk, so I don't think it
>> is the bottleneck in my case.  It is the same disk for both checkpoint and
>> data.  My queue depth remains relatively stable around 2000, which doesn't
>> bother me because of my batch size.
>>
>> On the AvroSink side, I have 126 ExecSources spread across 12 JVMs at 1GB
>> heap each.  Each VM has 4 AvroSinks across 2 separate servers, load
>> balanced and round robined (2 connections to each).  Is that a small enough
>> number of connections to remove the thread parameter on the AvroSource?
>>
>> @Brock
>> Each of my VMs on the downstream agents are 4GB heaps.  I watch the
>> MBeans pretty closely via jconsole, and they sit around 3GB used.
>>
>> Thanks again for all the help!
>>
>>
>> On Tue, Apr 16, 2013 at 2:26 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
>>
>>> Another possibility is that the downstream agent is near capacity from a
>>> memory perspective. What is your heap size for these agents?
>>>
>>>
>>> On Tue, Apr 16, 2013 at 2:19 PM, Hari Shreedharan <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>>  One possibility is that you are hitting the file channel disk too hard
>>>> - do you have just one disk for checkpoint and data? It might be getting
>>>> slow because of this? Also you should probably just remove the thread limit
>>>> on AvroSource. It usually does not cause too much havoc unless you have a
>>>> massive number of connections causing too many threads.
>>>>
>>>> --
>>>> Hari Shreedharan
>>>>
>>>> On Tuesday, April 16, 2013 at 12:07 PM, Chris Neal wrote:
>>>>
>>>> Thanks Hari.
>>>>
>>>> I increased both the connect and request timeouts to 40000ms, and I'm
>>>> testing that now.  I am talking on a LAN though, which is part of the
>>>> reason I'm concerned.  Seems like it might not actually be a network issue,
>>>> but perhaps an overloaded AvroSource on the back end?
>>>>
>>>>
>>>> On Tue, Apr 16, 2013 at 1:52 PM, Hari Shreedharan <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>  Looks like you are hitting Avro IPC timeouts - you should probably
>>>> increase it, especially if you are talking over WAN.
>>>>
>>>> --
>>>> Hari Shreedharan
>>>>
>>>> On Tuesday, April 16, 2013 at 11:38 AM, Chris Neal wrote:
>>>>
>>>> I'm seeing the same thing :)
>>>>
>>>> Mine is all on a local LAN though, so the fact that an RPC call doesn't
>>>> reply in 10000ms or 20000ms is quite odd.  My configuration is for the most
>>>> part the same as Denis' configuration.  Two tiered system, ExecSources
>>>> running tail -F on log files to an AvroSink, to an AvroSource, loading into
>>>> HDFS on the back tier.
>>>>
>>>> I, too, see this on the AvroSink
>>>>
>>>> Either (A):
>>>> [2013-04-15 23:57:14.827]
>>>> [org.apache.flume.sink.LoadBalancingSinkProcessor] [ WARN]
>>>> [SinkRunner-PollingRunner-LoadBalancingSinkProcessor] []
>>>>  (LoadBalancingSinkProcessor.java:process:154) Sink failed to consume
>>>> event. Attempting next sink if available.
>>>> org.apache.flume.EventDeliveryException: Failed to send events
>>>>         at org.apache.flume.sink.AvroSink.process(AvroSink.java:324)
>>>>         at
>>>> org.apache.flume.sink.LoadBalancingSinkProcessor.process(LoadBalancingSinkProcessor.java:151)
>>>>         at
>>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>>         at java.lang.Thread.run(Thread.java:619)