Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Re: Fatal issue (was RE: 0.8 throwing exception "Failed to find leader" and high-level consumer fails to make progress_


Copy link to this message
-
Re: Fatal issue (was RE: 0.8 throwing exception "Failed to find leader" and high-level consumer fails to make progress_
Hmm, that's a good theory. My understanding is that you have one thread
that first shuts down the consumer connector and then creates new streams
on the same connector. Is that right? If so, I don't think the race
condition can happen. When we shutdown the consumer connector, it waits
until the leaderFinder thread is stopped. So, if the leaderFinder thread is
still being started, shutdown will block.

Thanks,

Jun
On Tue, Jul 30, 2013 at 9:34 AM, Hargett, Phil <
[EMAIL PROTECTED]> wrote:

> Hmmm...is there a reason that stopConnections in ConsumerFetcherManager
> does not grab a lock before shutting down the leaderFinderThread?
>
> I don't see what prevents startConnections/stopConnections from causing a
> race in certain conditions and if called on separate threads.
>
> Given there are no locks, its seems even possible that the
> ZookeeperConsumerConnector could get all the way through its shutdown
> (including successfully calling stopConnections on ConsumerFetcherManager)
> before the leaderFinderThread has been able to startup. In that scenario,
> the leaderFinderThread would startup and immediately fail, because the
> ZkClient has already been closed.
>
> That is the behavior I am seeing: leaderFinderThread fails because
> ZkClient is hitting an NullPointerException, presumably because the
> ZkClient is already closed.
>
> Don't know if that is the cause but it could be.
>
> :)
>
> On Jul 30, 2013, at 12:01 PM, "Jun Rao" <[EMAIL PROTECTED]<mailto:
> [EMAIL PROTECTED]>> wrote:
>
> What's the revision of the 0.8 branch that you used? If that's older than
> the beta1 release, I recommend that you upgrade.
>
> Thanks,
>
> Jun
>
>
> On Tue, Jul 30, 2013 at 3:09 AM, Hargett, Phil <
> [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
> wrote:
> No, sorry, it didn't take 90 seconds to connect to ZK (at least I hope
> not). I had my consumer open for 90 secs in this case before shutting it
> down and disposing of it—hence any races caused by fast startup/shutdown
> should not have been relevant.
>
> I build from source off of the 0.8 branch, so isn't that pretty close to
> beta 1?
>
> :)
>
> On Jul 30, 2013, at 12:22 AM, "Jun Rao" <[EMAIL PROTECTED]<mailto:
> [EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>>
> wrote:
>
> Hmm, it takes 90 secs to connect to ZK? That seems way too long. Is your
> ZK healthy.
>
> Also, are you on the 0.8 beta1 release? If not, could you try that one? It
> may not be related, but we did fix some consumer side deadlock issues there.
>
> Thanks,
>
> Jun
>
>
> On Mon, Jul 29, 2013 at 9:02 AM, Hargett, Phil <
> [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]
> ><mailto:[EMAIL PROTECTED]<mailto:
> [EMAIL PROTECTED]>>> wrote:
> I think we have 3 different classes in play here:
>
>  * kafka.consumer.ZookeeperConsumerConnector
>  * kafka.utils.ShutdownableThread
>  * kafka.consumer.ConsumerFetcherManager
>
> The actual consumer is the first one, and it does a fair amount of work
> *before* stopping the fetcher—which then results in shutting down the
> leader thread
>
> If the initial connectZk method in ZookeeperConsumerConnector takes a long
> time (more than 90 seconds in 1 case this morning), then I could see the
> fetcher's stopConnections method not getting called, because there isn't a
> ConsumerFetcherManager instance yet.
>
> In other words, we could be shutting down the consumer before it is fully
> initialized—but that doesn't seem correct, as the code in
> ZookeeperConsumerConnector is synchronous—my application wouldn't have a
> reference to a partially initialized consumer.
>
> Odd.
>
> :)
>
> On Jul 29, 2013, at 11:22 AM, "Jun Rao" <[EMAIL PROTECTED]<mailto:
> [EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]
> >><mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:
> [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>>> wrote:
>
> There seems to be two separate issues.
>
> 1. Why do you see NullPointerException in the leaderFinder thread? I am