Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Questions regarding broker


Copy link to this message
-
Re: Questions regarding broker
Hey Calvin,

I apologize for not being able to get to this sooner. I don't think I
can reproduce the full scenario exactly as I don't have exclusive
access to so many machines, but I tried it locally and couldn't
reproduce it. Any chance you can reproduce it with a smaller
deployment? Is step 6 required? Would you mind pasting the full stack
trace that you saw?

Thanks,

Joel
On Wed, Jul 10, 2013 at 11:10 PM, Joel Koshy <[EMAIL PROTECTED]> wrote:
> Ok thanks - I'll go through this tomorrow.
>
> Joel
>
> On Wed, Jul 10, 2013 at 9:14 PM, Calvin Lei <[EMAIL PROTECTED]> wrote:
>> Joel,
>>    So i was able to reproduce the issue that I experienced. Please see the
>> steps below.
>> 1. Set up a 3-zookeeper and 6-broker cluster. Setup one topic with 2
>> partitions, with replication factor set to 3.
>> 2. Setup and run the console consumer, consuming messages from that topic.
>> 3. Produce a few messages to confirm the consumer is working.
>> 4. Stop the consumer.
>> 5. Shutdown (uncontrolled) the lead broker in one of the partition.
>> 6. Shutdown one of the zookeeper.
>> 7. Run the list topic script to confirm a new leader has been elected
>> 8. Bring up the console consumer again.
>> 9. Console consumer won't start because of error in rebalancing (when
>> fetching topic metadata).
>>      Error: Java.util.NoSuchElementException: Key Not Found (5).
>>      Trace: Client.Util.Scala:67
>>
>> Where broker 5 was the lead broker I shut down. I am using 0.8 beta.
>>
>> thanks,
>> Cal
>>
>>
>> On Tue, Jul 9, 2013 at 11:20 PM, Calvin Lei <[EMAIL PROTECTED]> wrote:
>>
>>> I will try to reproduce it. it was sporadic. My set up was a topic with 1
>>> partition and replication factor = 3.
>>> If i kill the console producer and then shut down the leader broker, a new
>>> leader is elected. If I again kill the new lead, I dont see the last broker
>>> be elected as a leader. Then i tried starting the console producer, i
>>> started seeing errors.
>>>
>>>
>>>
>>>
>>> On Tue, Jul 9, 2013 at 6:14 PM, Joel Koshy <[EMAIL PROTECTED]> wrote:
>>>
>>>> Not really - if you shutdown a leader broker (and assuming your
>>>> replication factor is > 1) then the other assigned replica will be
>>>> elected as the new leader. The producer would then look up metadata,
>>>> find the new leader and send requests to it. What do you see in the
>>>> logs?
>>>>
>>>> Joel
>>>>
>>>> On Tue, Jul 9, 2013 at 1:44 PM, Calvin Lei <[EMAIL PROTECTED]> wrote:
>>>> > Thanks you have me enough pointers to dig deeper. And I tested the fault
>>>> > tolerance by shutting down brokers randomly.
>>>> >
>>>> > What I noticed is if I shutdown brokers while my producer and consumer
>>>> are
>>>> > still running, they recover fine. However, if I shutdown a lead broker
>>>> > without a running producer, I can't seem to start the producer
>>>> afterwards
>>>> > without restarting the previous lead broker. Is this expected?
>>>> > On Jul 9, 2013 10:28 AM, "Joel Koshy" <[EMAIL PROTECTED]> wrote:
>>>> >
>>>> >> For 1 I forgot to add - there is an admin tool to reassign replicas
>>>> but it
>>>> >> would take longer than leader failover.
>>>> >>
>>>> >> Joel
>>>> >>
>>>> >> On Tuesday, July 9, 2013, Joel Koshy wrote:
>>>> >>
>>>> >> > 1 - no, unless broker4 is not the preferred leader. (The preferred
>>>> >> > leader is the first broker in the assigned replica list). If a
>>>> >> > non-preferred replica is the current leader you can run the
>>>> >> > PreferredReplicaLeaderElection admin command to move the leader.
>>>> >> > 2 - The actual leader movement (on leader failover) is fairly low -
>>>> >> > probably of the order of tens of ms. However, clients (producers,
>>>> >> > consumers) may take longer to detect that (it needs to get back an
>>>> >> > error response, handle an exception, issue a metadata request, get
>>>> the
>>>> >> > response to find the new leader, and all that can add up but it
>>>> should
>>>> >> > not be terribly high - I'm guessing on the order of a few hundred ms
>>>
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB