Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> EventDrivenSource and dead threads


Copy link to this message
-
Re: EventDrivenSource and dead threads
Why limit it to the sources? If there is going to be a change to one
component's lifecycle, then I see no reason not to change every component's
lifecycle.

Sinks and Channels could very well have this problem; so what about giving
each LifecycleAware component a takePulse method (or something); or to
avoid creating a new method, add a new lifecycle state 'crashed' or such
which, when detected, causes a restart of the component. Then components
would just need to override getLifecycleState (if this method is polled
regularly; I don't know. maybe use a listener for when there's a state
change) to detect if it has crashed/needs to be restarted.

Just my thoughts,

- Connor
On Wed, Jan 16, 2013 at 9:08 PM, Juhani Connolly <
[EMAIL PROTECTED]> wrote:

> Hmm, overriding the implementation of getLifecycleState provided by
> AbstractSource could work. It would be going against the convention that
> has been maintained in all other components(that I can think of)
>
>
> On 01/17/2013 01:20 PM, Brock Noland wrote:
>
>> Hi,
>>
>> Yes I can definitely see the issue. It sucks that we'd have to add yet
>> another thread. An alternative which wouldn't require another thread
>> would be to check the optional interface in the supervisor,
>> approximately here:
>>
>> https://github.com/apache/**flume/blob/trunk/flume-ng-**
>> core/src/main/java/org/apache/**flume/lifecycle/**
>> LifecycleSupervisor.java#L240<https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java#L240>
>>
>> However, I am not sold on the supervisor being the best place to fix
>> this as I am not sure that other lifecycle components would need this.
>>
>>
>>
>> Brock
>>
>> On Wed, Jan 16, 2013 at 7:45 PM, Juhani Connolly
>> <juhani_connolly@cyberagent.**co.jp <[EMAIL PROTECTED]>>
>> wrote:
>>
>>> I came upon an issue with ScribeSource,  though it's theoretically
>>> applicable to any EventDrivenSource whose event generating thread(s) die.
>>> Simple put, sending a bad packet to the thrift(scribe protocol) port will
>>> result in it trying to allocate space for some arbitrarily large packet
>>> resulting in an OOMException which kills the thread(incidentally I
>>> thought
>>> this would be an issue in avro too, but it throws an exception before
>>> making
>>> excessive allocation requests).
>>>
>>> As far as flume is concerned, the component is still alive. stop() was
>>> never
>>> called, so even monitoring the component state using jmx will not notice
>>> anything wrong. This situation occurs from user error, but there is
>>> potential for other errors leaving a zombie component. I think it would
>>> be
>>> more user friendly to be able to recover from such errors.
>>>
>>> I'm thinking of adding a StatusPollable interface that EventDrivenSources
>>> can optionally implement(because we can't change the interface without a
>>> version change). If implemented, the EventDrivenSourceRunner would
>>> schedule
>>> a regular poll to check the state. Upon failure it could either call
>>> stop()
>>> to signal it broke. With autoRestartPolicy, the source would then get
>>> restarted by its supervisor.
>>>
>>> Would appreciate any opinions before I put together a patch/post an
>>> issue.
>>>
>>
>>
>>
>