Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> EventDrivenSource and dead threads


Copy link to this message
-
Re: EventDrivenSource and dead threads
Sink, Source and Channel all extend LifecycleAware, so the function is
available to all components already.

I was more questioning whether it was reasonable to start including
logic to determine the state. That being said, I think the precedent of
just returning the state set by start/stop is more one of habit, so on
further thought I don't see it as being unreasonable. I'm going to give
fixing ScribeSource with it a poke.

As to new lifecycle states, I took a pass at reworking the lifecycle
model with guavas service implementation in the past, but it took some
very significant changes and didn't get the momentum/interest necessary
to keep working on it. That ticket is here
https://issues.apache.org/jira/browse/FLUME-966 . Brock might be working
on it now though the issue doesn't appear to have had attention since
october.

On 01/17/2013 03:01 PM, Connor Woodson wrote:
> Why limit it to the sources? If there is going to be a change to one
> component's lifecycle, then I see no reason not to change every component's
> lifecycle.
>
> Sinks and Channels could very well have this problem; so what about giving
> each LifecycleAware component a takePulse method (or something); or to
> avoid creating a new method, add a new lifecycle state 'crashed' or such
> which, when detected, causes a restart of the component. Then components
> would just need to override getLifecycleState (if this method is polled
> regularly; I don't know. maybe use a listener for when there's a state
> change) to detect if it has crashed/needs to be restarted.
>
> Just my thoughts,
>
> - Connor
>
>
> On Wed, Jan 16, 2013 at 9:08 PM, Juhani Connolly <
> [EMAIL PROTECTED]> wrote:
>
>> Hmm, overriding the implementation of getLifecycleState provided by
>> AbstractSource could work. It would be going against the convention that
>> has been maintained in all other components(that I can think of)
>>
>>
>> On 01/17/2013 01:20 PM, Brock Noland wrote:
>>
>>> Hi,
>>>
>>> Yes I can definitely see the issue. It sucks that we'd have to add yet
>>> another thread. An alternative which wouldn't require another thread
>>> would be to check the optional interface in the supervisor,
>>> approximately here:
>>>
>>> https://github.com/apache/**flume/blob/trunk/flume-ng-**
>>> core/src/main/java/org/apache/**flume/lifecycle/**
>>> LifecycleSupervisor.java#L240<https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java#L240>
>>>
>>> However, I am not sold on the supervisor being the best place to fix
>>> this as I am not sure that other lifecycle components would need this.
>>>
>>>
>>>
>>> Brock
>>>
>>> On Wed, Jan 16, 2013 at 7:45 PM, Juhani Connolly
>>> <juhani_connolly@cyberagent.**co.jp <[EMAIL PROTECTED]>>
>>> wrote:
>>>
>>>> I came upon an issue with ScribeSource,  though it's theoretically
>>>> applicable to any EventDrivenSource whose event generating thread(s) die.
>>>> Simple put, sending a bad packet to the thrift(scribe protocol) port will
>>>> result in it trying to allocate space for some arbitrarily large packet
>>>> resulting in an OOMException which kills the thread(incidentally I
>>>> thought
>>>> this would be an issue in avro too, but it throws an exception before
>>>> making
>>>> excessive allocation requests).
>>>>
>>>> As far as flume is concerned, the component is still alive. stop() was
>>>> never
>>>> called, so even monitoring the component state using jmx will not notice
>>>> anything wrong. This situation occurs from user error, but there is
>>>> potential for other errors leaving a zombie component. I think it would
>>>> be
>>>> more user friendly to be able to recover from such errors.
>>>>
>>>> I'm thinking of adding a StatusPollable interface that EventDrivenSources
>>>> can optionally implement(because we can't change the interface without a
>>>> version change). If implemented, the EventDrivenSourceRunner would
>>>> schedule
>>>> a regular poll to check the state. Upon failure it could either call
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB