Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> EventDrivenSource and dead threads


Copy link to this message
-
Re: EventDrivenSource and dead threads
What I described isn't really taking control of lifecycle.

What would happen is:
- source start. State=start
- OOM exception happens
- getLifecycleState called, override calls server.isServing(). This is a
lightweight call. If it returns false, it either calls stop() or local
lifeCycleState is set to ERROR (I'm not actually sure how this is
handled). The latter is probably the correct call in this case, but I'm
not entirely sure if the supervisor will ever restart it then.

The supervisor would still be doing any restarting. I guess it's really
a matter of who detects the bad state(getLifecycleState or a scheduled
runnable) and what that call is allowed to do(switch state to ERROR or
call stop()). When the source breaks, changing its status to accurately
reflect this is not breaking the paradigm(though calling stop() may be)

It looks to me like switching the state to ERROR should be sufficient...
Monitor should then try to start the source again. However not calling
stop() before that may cause resource leakage? Could someone confirm the
behavior in regards to ERROR?

On 01/17/2013 04:02 PM, Connor Woodson wrote:
> What I was trying to say is that the ScribeSource should not be responsible
> for restarting itself (just from my understanding of your idea; it breaks
> the existing paradigm as components should not have to control their own
> lifecycle). Going from Brock's link, I feel the most dynamic solution would
> be possible to just add in a lifecyle state RESTART and place it in that
> switch statement; when that state is reached, it tries to stop then restart
> the component and then sets the desired state to START (or STOP if it
> couldn't start it again, or set error=true if it couldn't stop it).
>
> And in a way to prevent overriding getLifecycleState to return RESTART,
> there could either be an inherited function from AbstractSource to call for
> a restart, there could maybe be an option to set it to RESTART on crash, or
> something else.
>
> I'll admit though that I know little about the lifecycle system, so I have
> no idea if this idea is any better.
>
> - Connor
>
>
> On Wed, Jan 16, 2013 at 10:21 PM, Juhani Connolly <
> [EMAIL PROTECTED]> wrote:
>
>> Sink, Source and Channel all extend LifecycleAware, so the function is
>> available to all components already.
>>
>> I was more questioning whether it was reasonable to start including logic
>> to determine the state. That being said, I think the precedent of just
>> returning the state set by start/stop is more one of habit, so on further
>> thought I don't see it as being unreasonable. I'm going to give fixing
>> ScribeSource with it a poke.
>>
>> As to new lifecycle states, I took a pass at reworking the lifecycle model
>> with guavas service implementation in the past, but it took some very
>> significant changes and didn't get the momentum/interest necessary to keep
>> working on it. That ticket is here https://issues.apache.org/**
>> jira/browse/FLUME-966 <https://issues.apache.org/jira/browse/FLUME-966> .
>> Brock might be working on it now though the issue doesn't appear to have
>> had attention since october.
>>
>>
>> On 01/17/2013 03:01 PM, Connor Woodson wrote:
>>
>>> Why limit it to the sources? If there is going to be a change to one
>>> component's lifecycle, then I see no reason not to change every
>>> component's
>>> lifecycle.
>>>
>>> Sinks and Channels could very well have this problem; so what about giving
>>> each LifecycleAware component a takePulse method (or something); or to
>>> avoid creating a new method, add a new lifecycle state 'crashed' or such
>>> which, when detected, causes a restart of the component. Then components
>>> would just need to override getLifecycleState (if this method is polled
>>> regularly; I don't know. maybe use a listener for when there's a state
>>> change) to detect if it has crashed/needs to be restarted.
>>>
>>> Just my thoughts,
>>>
>>> - Connor
>>>
>>>
>>> On Wed, Jan 16, 2013 at 9