Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Questions regarding broker


Copy link to this message
-
Re: Questions regarding broker
Hey Calvin,

I apologize for not being able to get to this sooner. I don't think I
can reproduce the full scenario exactly as I don't have exclusive
access to so many machines, but I tried it locally and couldn't
reproduce it. Any chance you can reproduce it with a smaller
deployment? Is step 6 required? Would you mind pasting the full stack
trace that you saw?

Thanks,

Joel
On Wed, Jul 10, 2013 at 11:10 PM, Joel Koshy <[EMAIL PROTECTED]> wrote:
> Ok thanks - I'll go through this tomorrow.
>
> Joel
>
> On Wed, Jul 10, 2013 at 9:14 PM, Calvin Lei <[EMAIL PROTECTED]> wrote:
>> Joel,
>>    So i was able to reproduce the issue that I experienced. Please see the
>> steps below.
>> 1. Set up a 3-zookeeper and 6-broker cluster. Setup one topic with 2
>> partitions, with replication factor set to 3.
>> 2. Setup and run the console consumer, consuming messages from that topic.
>> 3. Produce a few messages to confirm the consumer is working.
>> 4. Stop the consumer.
>> 5. Shutdown (uncontrolled) the lead broker in one of the partition.
>> 6. Shutdown one of the zookeeper.
>> 7. Run the list topic script to confirm a new leader has been elected
>> 8. Bring up the console consumer again.
>> 9. Console consumer won't start because of error in rebalancing (when
>> fetching topic metadata).
>>      Error: Java.util.NoSuchElementException: Key Not Found (5).
>>      Trace: Client.Util.Scala:67
>>
>> Where broker 5 was the lead broker I shut down. I am using 0.8 beta.
>>
>> thanks,
>> Cal
>>
>>
>> On Tue, Jul 9, 2013 at 11:20 PM, Calvin Lei <[EMAIL PROTECTED]> wrote:
>>
>>> I will try to reproduce it. it was sporadic. My set up was a topic with 1
>>> partition and replication factor = 3.
>>> If i kill the console producer and then shut down the leader broker, a new
>>> leader is elected. If I again kill the new lead, I dont see the last broker
>>> be elected as a leader. Then i tried starting the console producer, i
>>> started seeing errors.
>>>
>>>
>>>
>>>
>>> On Tue, Jul 9, 2013 at 6:14 PM, Joel Koshy <[EMAIL PROTECTED]> wrote:
>>>
>>>> Not really - if you shutdown a leader broker (and assuming your
>>>> replication factor is > 1) then the other assigned replica will be
>>>> elected as the new leader. The producer would then look up metadata,
>>>> find the new leader and send requests to it. What do you see in the
>>>> logs?
>>>>
>>>> Joel
>>>>
>>>> On Tue, Jul 9, 2013 at 1:44 PM, Calvin Lei <[EMAIL PROTECTED]> wrote:
>>>> > Thanks you have me enough pointers to dig deeper. And I tested the fault
>>>> > tolerance by shutting down brokers randomly.
>>>> >
>>>> > What I noticed is if I shutdown brokers while my producer and consumer
>>>> are
>>>> > still running, they recover fine. However, if I shutdown a lead broker
>>>> > without a running producer, I can't seem to start the producer
>>>> afterwards
>>>> > without restarting the previous lead broker. Is this expected?
>>>> > On Jul 9, 2013 10:28 AM, "Joel Koshy" <[EMAIL PROTECTED]> wrote:
>>>> >
>>>> >> For 1 I forgot to add - there is an admin tool to reassign replicas
>>>> but it
>>>> >> would take longer than leader failover.
>>>> >>
>>>> >> Joel
>>>> >>
>>>> >> On Tuesday, July 9, 2013, Joel Koshy wrote:
>>>> >>
>>>> >> > 1 - no, unless broker4 is not the preferred leader. (The preferred
>>>> >> > leader is the first broker in the assigned replica list). If a
>>>> >> > non-preferred replica is the current leader you can run the
>>>> >> > PreferredReplicaLeaderElection admin command to move the leader.
>>>> >> > 2 - The actual leader movement (on leader failover) is fairly low -
>>>> >> > probably of the order of tens of ms. However, clients (producers,
>>>> >> > consumers) may take longer to detect that (it needs to get back an
>>>> >> > error response, handle an exception, issue a metadata request, get
>>>> the
>>>> >> > response to find the new leader, and all that can add up but it
>>>> should
>>>> >> > not be terribly high - I'm guessing on the order of a few hundred ms
>>>