Why limit it to the sources? If there is going to be a change to one
component's lifecycle, then I see no reason not to change every component's
Sinks and Channels could very well have this problem; so what about giving
each LifecycleAware component a takePulse method (or something); or to
avoid creating a new method, add a new lifecycle state 'crashed' or such
which, when detected, causes a restart of the component. Then components
would just need to override getLifecycleState (if this method is polled
regularly; I don't know. maybe use a listener for when there's a state
change) to detect if it has crashed/needs to be restarted.
Just my thoughts,
On Wed, Jan 16, 2013 at 9:08 PM, Juhani Connolly <
[EMAIL PROTECTED]> wrote:
> Hmm, overriding the implementation of getLifecycleState provided by
> AbstractSource could work. It would be going against the convention that
> has been maintained in all other components(that I can think of)
> On 01/17/2013 01:20 PM, Brock Noland wrote:
>> Yes I can definitely see the issue. It sucks that we'd have to add yet
>> another thread. An alternative which wouldn't require another thread
>> would be to check the optional interface in the supervisor,
>> approximately here:
>> However, I am not sold on the supervisor being the best place to fix
>> this as I am not sure that other lifecycle components would need this.
>> On Wed, Jan 16, 2013 at 7:45 PM, Juhani Connolly
>> <juhani_connolly@cyberagent.**co.jp <[EMAIL PROTECTED]>>
>>> I came upon an issue with ScribeSource, though it's theoretically
>>> applicable to any EventDrivenSource whose event generating thread(s) die.
>>> Simple put, sending a bad packet to the thrift(scribe protocol) port will
>>> result in it trying to allocate space for some arbitrarily large packet
>>> resulting in an OOMException which kills the thread(incidentally I
>>> this would be an issue in avro too, but it throws an exception before
>>> excessive allocation requests).
>>> As far as flume is concerned, the component is still alive. stop() was
>>> called, so even monitoring the component state using jmx will not notice
>>> anything wrong. This situation occurs from user error, but there is
>>> potential for other errors leaving a zombie component. I think it would
>>> more user friendly to be able to recover from such errors.
>>> I'm thinking of adding a StatusPollable interface that EventDrivenSources
>>> can optionally implement(because we can't change the interface without a
>>> version change). If implemented, the EventDrivenSourceRunner would
>>> a regular poll to check the state. Upon failure it could either call
>>> to signal it broke. With autoRestartPolicy, the source would then get
>>> restarted by its supervisor.
>>> Would appreciate any opinions before I put together a patch/post an