Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Fetch request with correlation id 1171437 from client ReplicaFetcherThread-0-1 on partition [meetme,0] failed due to Leader not local for partition


Copy link to this message
-
Re: Fetch request with correlation id 1171437 from client ReplicaFetcherThread-0-1 on partition [meetme,0] failed due to Leader not local for partition
Joel Koshy 2013-06-28, 21:28
Leader election occurs when brokers are bounced or lose their
zookeeper registration. Do you have a state-change.log on your
brokers? Also can you see what's in the following zk paths:
get /brokers/topics/meetme
get /brokers/topics/meetme/partitions/0/state

On Fri, Jun 28, 2013 at 1:40 PM, Vadim Keylis <[EMAIL PROTECTED]> wrote:
> Joel. My problem after your explanation is that leader for some reason did
> not get elected and exception is been thrown for hours now. What is the
> best way to force leader creation for that partition?
>
> Vadim
>
>
> On Fri, Jun 28, 2013 at 12:26 PM, Joel Koshy <[EMAIL PROTECTED]> wrote:
>
>> Just wanted to clarify: the topic.metadata.refresh.interval.ms would apply
>> to producers - and mainly with ack = 0. (If ack = 1, then a metadata
>> request would be issued on this exception although even with ack > 0 it is
>> useful to have the metadata refresh for refreshing information about how
>> many partitions are available.)
>>
>> For replica fetchers (Vadim's case) the exceptions would persist for as
>> long as the new leader for the replica in question is elected. It should
>> not take too long. When the leader is elected, the controller will send out
>> an RPC to the new leaders and followers and the above exceptions will go
>> away.
>>
>> Also, to answer your question: the "right" way to shutdown an 0.8 cluster
>> is to use controlled shutdown. That will not eliminate the exceptions, but
>> they are more for informative purposes and are non-fatal (i.e., the logging
>> can probably be improved a bit).
>>
>>
>>
>> On Fri, Jun 28, 2013 at 11:47 AM, David DeMaagd <[EMAIL PROTECTED]
>> >wrote:
>>
>> > Unless I'm misreading something, that is controlled by the
>> > topic.metadata.refresh.interval.ms variable (defaults to 10 minutes),
>> > and I've not seen it run longer than that (unless there was other
>> > problems besides that going on).
>> >
>> > I would check the JMX values for things under
>> > "kafka.server":type="ReplicaManager",
>> > particularly UnderReplicatedPartitions and possibly the ISR
>> > Expand/Shrinks values - those could indicate a problem on the brokers
>> > that is preventing things from settling down completely.  Might also
>> > look and see if you are doing any heavy GCs (which can cause zookeeper
>> > connection issues, which would then complicate the ISR election stuff).
>> >
>> > --
>> > Dave DeMaagd
>> > [EMAIL PROTECTED] | 818 262 7958
>> >
>> > ([EMAIL PROTECTED] - Fri, Jun 28, 2013 at 11:32:42AM -0700)
>> > > David. What is the expected time frame for the exception to continue?
>> Its
>> > > an hour has passed since short downtime and I still see the exception
>> in
>> > > kafka service logs.
>> > >
>> > > Thanks,
>> > > Vadim
>> > >
>> > >
>> > > On Fri, Jun 28, 2013 at 11:25 AM, David DeMaagd <[EMAIL PROTECTED]
>> > >wrote:
>> > >
>> > > > Getting kafka.common.NotLeaderForPartitionException for a time after
>> a
>> > > > node is brought back on line (especially if it is a short downtime)
>> is
>> > > > normal - that is because the consumers have not yet completely picked
>> > up
>> > > > the new leader information.  If should settle shortly.
>> > > >
>> > > > --
>> > > > Dave DeMaagd
>> > > > [EMAIL PROTECTED] | 818 262 7958
>> > > >
>> > > > ([EMAIL PROTECTED] - Fri, Jun 28, 2013 at 11:08:46AM -0700)
>> > > > > I want to clarify that I restarted only one kafka node, all others
>> > were
>> > > > > running and did not require restart
>> > > > >
>> > > > >
>> > > > > On Fri, Jun 28, 2013 at 10:57 AM, Vadim Keylis <
>> > [EMAIL PROTECTED]
>> > > > >wrote:
>> > > > >
>> > > > > > Good morning. I have a cluster of 3 kafka nodes. They were both
>> > > > running at
>> > > > > > the time. I need it to make configuration change in the property
>> > file
>> > > > and
>> > > > > > restart kafka. I have not broker shutdown tool, but simple used
>> > pkill
>> > > > -TERM
>> > > > > > -u ${KAFKA_USER} -f kafka.Kafka. That suddenly cause the
>> >  exception.