Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Copy availability when broker goes down?


Copy link to this message
-
Re: Copy availability when broker goes down?
Chris,

As Neha said, the 1st copy of a partition is the preferred replica and we
try to spread them evenly across the brokers. When a broker is restarted,
we don't automatically move the leader back to the preferred replica
though. You will have to run a command line
tool PreferredReplicaLeaderElectionCommand to balance the leaders again.

Also, I recommend that you try the latest code in 0.8. A bunch of issues
have been fixes since Jan. You will have to wipe out all your ZK and Kafka
data first though.

Thanks,

Jun

On Mon, Mar 4, 2013 at 8:32 AM, Chris Curtin <[EMAIL PROTECTED]> wrote:

> Hi,
>
> (Hmm, take 2. Apache's spam filter doesn't like the word to describe the
> copy of the data. 'R - E -P -L -I -C -A' so it blocked it from sending!
> Using 'copy' below to mean that concept)
>
> I’m running 0.8.0 with HEAD from end of January (not the merge you guys did
> last night).
>
> I’m testing how the producer responds to loss of brokers, what errors are
> produced etc. and noticed some strange things as I shutdown servers in my
> cluster.
>
> Setup:
> 4 node cluster
> 1 topic, 3 copies in the set
> 10 partitions numbered 0-9
>
> State of the cluster is determined using TopicMetadataRequest.
>
> When I start with a full cluster (2nd column is the partition id, next is
> leader, then the copy set and ISR):
>
> Java: 0:vrd03.atlnp1 R:[  vrd03.atlnp1 vrd04.atlnp1 vrd01.atlnp1] I:[
> vrd03.atlnp1 vrd04.atlnp1 vrd01.atlnp1]
> Java: 1:vrd04.atlnp1 R:[  vrd04.atlnp1 vrd01.atlnp1 vrd02.atlnp1] I:[
> vrd04.atlnp1 vrd01.atlnp1 vrd02.atlnp1]
> Java: 2:vrd03.atlnp1 R:[  vrd01.atlnp1 vrd02.atlnp1 vrd03.atlnp1] I:[
> vrd03.atlnp1 vrd01.atlnp1 vrd02.atlnp1]
> Java: 3:vrd03.atlnp1 R:[  vrd02.atlnp1 vrd03.atlnp1 vrd04.atlnp1] I:[
> vrd03.atlnp1 vrd04.atlnp1 vrd02.atlnp1]
> Java: 4:vrd03.atlnp1 R:[  vrd03.atlnp1 vrd01.atlnp1 vrd02.atlnp1] I:[
> vrd03.atlnp1 vrd01.atlnp1 vrd02.atlnp1]
> Java: 5:vrd03.atlnp1 R:[  vrd04.atlnp1 vrd02.atlnp1 vrd03.atlnp1] I:[
> vrd03.atlnp1 vrd04.atlnp1 vrd02.atlnp1]
> Java: 6:vrd03.atlnp1 R:[  vrd01.atlnp1 vrd03.atlnp1 vrd04.atlnp1] I:[
> vrd03.atlnp1 vrd04.atlnp1 vrd01.atlnp1]
> Java: 7:vrd04.atlnp1 R:[  vrd02.atlnp1 vrd04.atlnp1 vrd01.atlnp1] I:[
> vrd04.atlnp1 vrd01.atlnp1 vrd02.atlnp1]
> Java: 8:vrd03.atlnp1 R:[  vrd03.atlnp1 vrd02.atlnp1 vrd04.atlnp1] I:[
> vrd03.atlnp1 vrd04.atlnp1 vrd02.atlnp1]
> Java: 9:vrd03.atlnp1 R:[  vrd04.atlnp1 vrd03.atlnp1 vrd01.atlnp1] I:[
> vrd03.atlnp1 vrd04.atlnp1 vrd01.atlnp1]
>
> When I stop vrd01, which isn’t leader on any:
>
> Java: 0:vrd03.atlnp1 R:[ ] I:[]
> Java: 1:vrd04.atlnp1 R:[ ] I:[]
> Java: 2:vrd03.atlnp1 R:[ ] I:[]
> Java: 3:vrd03.atlnp1 R:[  vrd02.atlnp1 vrd03.atlnp1 vrd04.atlnp1] I:[
> vrd03.atlnp1 vrd04.atlnp1 vrd02.atlnp1]
> Java: 4:vrd03.atlnp1 R:[ ] I:[]
> Java: 5:vrd03.atlnp1 R:[  vrd04.atlnp1 vrd02.atlnp1 vrd03.atlnp1] I:[
> vrd03.atlnp1 vrd04.atlnp1 vrd02.atlnp1]
> Java: 6:vrd03.atlnp1 R:[ ] I:[]
> Java: 7:vrd04.atlnp1 R:[ ] I:[]
> Java: 8:vrd03.atlnp1 R:[  vrd03.atlnp1 vrd02.atlnp1 vrd04.atlnp1] I:[
> vrd03.atlnp1 vrd04.atlnp1 vrd02.atlnp1]
> Java: 9:vrd03.atlnp1 R:[ ] I:[]
>
> Does this mean that none of the partitions that used to have a copy on
> vrd01 are updating ANY of the copies?
>
> I ran another test, again starting with a full cluster and all partitions
> had a full set of copies. When I stop the broker which was leader for 9 of
> the 10 partitions, the leaders were all elected on one machine instead of
> the set of 3. Should the leaders have been better spread out? Also the
> copies weren’t fully populated either.
>
> Last test: started with a full cluster, showing all copies available.
> Stopped a broker that was not a leader for any partition. Noticed that the
> partitions where the stopped machine was in the copy set didn’t show any
> copies like above. Let the cluster sit for 30 minutes and didn’t see any
> new copies being brought on line. How should the cluster handle a machine
> that is down for an extended period of time?
>
> I don’t have a new machine I could add to the cluster, but what happens