Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> EventDrivenSource and dead threads


Copy link to this message
-
Re: EventDrivenSource and dead threads
I don't know if this is within the scope of this change, but sinks
sometimes will need to be restarted; for instance, I had an HDFS sink crash
from an Out of Memory error (caused by the JIRA I filed; I wasn't planning
on using a large heap) and it doesn't automatically restart; I don't know
if there's a nice way to detect of a sink/source has crashed, but if so it
would be nice to have them auto-restart when they go down.

- Connor
On Thu, Jan 17, 2013 at 12:27 AM, Juhani Connolly <
[EMAIL PROTECTED]> wrote:

> What I described isn't really taking control of lifecycle.
>
> What would happen is:
> - source start. State=start
> - OOM exception happens
> - getLifecycleState called, override calls server.isServing(). This is a
> lightweight call. If it returns false, it either calls stop() or local
> lifeCycleState is set to ERROR (I'm not actually sure how this is handled).
> The latter is probably the correct call in this case, but I'm not entirely
> sure if the supervisor will ever restart it then.
>
> The supervisor would still be doing any restarting. I guess it's really a
> matter of who detects the bad state(getLifecycleState or a scheduled
> runnable) and what that call is allowed to do(switch state to ERROR or call
> stop()). When the source breaks, changing its status to accurately reflect
> this is not breaking the paradigm(though calling stop() may be)
>
> It looks to me like switching the state to ERROR should be sufficient...
> Monitor should then try to start the source again. However not calling
> stop() before that may cause resource leakage? Could someone confirm the
> behavior in regards to ERROR?
>
>
> On 01/17/2013 04:02 PM, Connor Woodson wrote:
>
>> What I was trying to say is that the ScribeSource should not be
>> responsible
>> for restarting itself (just from my understanding of your idea; it breaks
>> the existing paradigm as components should not have to control their own
>> lifecycle). Going from Brock's link, I feel the most dynamic solution
>> would
>> be possible to just add in a lifecyle state RESTART and place it in that
>> switch statement; when that state is reached, it tries to stop then
>> restart
>> the component and then sets the desired state to START (or STOP if it
>> couldn't start it again, or set error=true if it couldn't stop it).
>>
>> And in a way to prevent overriding getLifecycleState to return RESTART,
>> there could either be an inherited function from AbstractSource to call
>> for
>> a restart, there could maybe be an option to set it to RESTART on crash,
>> or
>> something else.
>>
>> I'll admit though that I know little about the lifecycle system, so I have
>> no idea if this idea is any better.
>>
>> - Connor
>>
>>
>> On Wed, Jan 16, 2013 at 10:21 PM, Juhani Connolly <
>> [EMAIL PROTECTED].**jp <[EMAIL PROTECTED]>>
>> wrote:
>>
>>  Sink, Source and Channel all extend LifecycleAware, so the function is
>>> available to all components already.
>>>
>>> I was more questioning whether it was reasonable to start including logic
>>> to determine the state. That being said, I think the precedent of just
>>> returning the state set by start/stop is more one of habit, so on further
>>> thought I don't see it as being unreasonable. I'm going to give fixing
>>> ScribeSource with it a poke.
>>>
>>> As to new lifecycle states, I took a pass at reworking the lifecycle
>>> model
>>> with guavas service implementation in the past, but it took some very
>>> significant changes and didn't get the momentum/interest necessary to
>>> keep
>>> working on it. That ticket is here https://issues.apache.org/**
>>> jira/browse/FLUME-966 <https://issues.apache.org/**jira/browse/FLUME-966<https://issues.apache.org/jira/browse/FLUME-966>>
>>> .
>>>
>>> Brock might be working on it now though the issue doesn't appear to have
>>> had attention since october.
>>>
>>>
>>> On 01/17/2013 03:01 PM, Connor Woodson wrote:
>>>
>>>  Why limit it to the sources? If there is going to be a change to one