Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - Re: 0.8 throwing exception "Failed to find leader" and high-level consumer fails to make progress


+
Jun Rao 2013-06-25, 16:45
+
Hargett, Phil 2013-06-25, 17:08
+
Jun Rao 2013-06-26, 03:59
Copy link to this message
-
Re: 0.8 throwing exception "Failed to find leader" and high-level consumer fails to make progress
Jun Rao 2013-07-22, 04:53
Do you see ZK session expiration in the consumer and the broker log (search
for Expired)? Also, could you try the 0.8 beta1 release?

Thanks,

Jun
On Tue, Jul 16, 2013 at 11:56 AM, Hargett, Phil <
[EMAIL PROTECTED]> wrote:

> Hmm....
>
> This issue continues to emerge occasionally, albeit less often than in the
> past.
>
> If I hit it after several days or months of uptime, that would be okay,
> but today I have hit it twice within the first hour of 2 separate load
> tests.
>
> I've cleaned up the code in my application to ensure I do not start / stop
> consumers rapidly.  In the most recent case, a consumer had been in use for
> several minutes before being shutdown, and this stack trace still emerged.
>
> For me, it's not harmless, because this exception is on a background
> thread that continues to spin wildly (continually hitting this exception
> rather than aborting) long after I've shutdown and disposed of my consumer.
>  I never have a chance to intercept it, because I never receive the
> exception in my code.
>
> The only remedy is to restart my application, which seems very undesirable.
>
> I'm using a recent build of Kafka 0.8 pulled from the 0.8 branch within
> the last month; actually, I built it on June 25, the date of this original
> thread.
>
> Thoughts?
> ________________________________________
> From: Jun Rao [[EMAIL PROTECTED]]
> Sent: Tuesday, June 25, 2013 11:58 PM
> To: [EMAIL PROTECTED]
> Subject: Re: 0.8 throwing exception "Failed to find leader" and high-level
> consumer fails to make progress
>
> The exception is likely due to a race condition btw the logic in ZK watcher
> and the closing of ZK connection. It's harmless, except for the weird
> exception.
>
> Thanks,
>
> Jun
>
>
> On Tue, Jun 25, 2013 at 10:07 AM, Hargett, Phil <
> [EMAIL PROTECTED]> wrote:
>
> > Possibly.
> >
> > I see evidence that its being stopped / started every 30 seconds in same
> > cases (due to my code). It's entirely possible that I have a race, too,
> in
> > that 2 separate pieces of code could be triggering such a stop / start.
> >
> > Gives me something to track down. Thank you!!
> >
> > On Jun 25, 2013, at 12:45 PM, "Jun Rao" <[EMAIL PROTECTED]> wrote:
> >
> > > This typically only happens when the consumerConnector is shut down.
> Are
> > > you restarting the consumerConnector often?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Tue, Jun 25, 2013 at 9:40 AM, Hargett, Phil <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > >> Seeing this exception a LOT (3-4 times per second, same log topic).
> > >>
> > >> I'm using external code to feed data to about 50 different log topics
> > over
> > >> a cluster of 3 Kafka 0.8 brokers.  There are 3 ZooKeeper instances as
> > well,
> > >> all of this is running on EC2.  My application creates a high-level
> > >> consumer (1 per topic) to consumer data from each and do further
> > processing.
> > >>
> > >> The problem is this exception is in the high-level consumer, so my
> code
> > >> has no way of knowing that it's become stuck.
> > >>
> > >> This exception does not always appear, but as far as I can tell, once
> > this
> > >> happens, the only cure is to restart my application's process.
> > >>
> > >> I saw this in 0.8 built from source about 1 week ago, and also am
> seeing
> > >> it today after pulling the latest 0.8 sources and rebuilding Kafka.
> > >>
> > >> Thoughts?
> > >>
> > >> Failed to find leader for Set([topic6,0]):
> > java.lang.NullPointerException
> > >>        at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416)
> > >>        at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413)
> > >>        at
> > >> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> > >>        at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413)
> > >>        at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)
> > >>        at
> > >> kafka.utils.ZkUtils$.getChildrenParentMayNotExist(ZkUtils.scala:438)
>