Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # dev - [jira] [Commented] (KAFKA-513) Add state change log to Kafka brokers


+
Neha Narkhede 2013-02-26, 00:58
+
Swapnil Ghike 2013-02-27, 19:43
Copy link to this message
-
[jira] [Commented] (KAFKA-513) Add state change log to Kafka brokers
"Neha Narkhede 2013-03-05, 01:06

    [ https://issues.apache.org/jira/browse/KAFKA-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592884#comment-13592884 ]

Neha Narkhede commented on KAFKA-513:
-------------------------------------

Thanks for the patch! This patch is great and is very close to being checked in. Few review suggestions -

1. ControllerChannelManager
1.1 The log message when the controller sends the data includes the controller's epoch, but the response doesn't. It is useful to know the epoch wherever the controller id is logged.
1.2 The "," between controller id and epoch probably could be skipped, this pattern is not followed elsewhere in this patch.

2. Partition
2.1 The error statement when the broker is dropping the leaderAndIsr request should be -

"since local leader epoch %d is >= the request's leader epoch"

2.2 Also, let's add "Broker %d aborted..." to the following statement similar to "Broker %d discarded..."

"Aborted the become-follower state change since leader %d for partition [%s,%d]"

3. PartitionStateMachine
In electLeaderForPartition() API, it is useful to include the contents of the StateChangedFailedException in the state change log. This is because it is useful to know that after starting the state change, it got aborted because another controller with a higher controller epoch was detected. So something like -
"Controller %d epoch %d aborted leader election for partition [%s,%d] since ..."

4. ReplicaManager

The log statements here are of the form of [Replica Manager on Broker %d]: Handling... This is different from the ones in Partition which are like "Broker %d started become-follower...". I guess the default logIdent, which is meant for the main log, is the cause of this discrepancy. For the purpose of the state change log, we actually don't care if it is Replica Manager or Partition on that broker, it would just suffice to have the simplest statement to communicate the state changes like the ones in Partition except the logIdent.

5. ReplicaStateMachine
Same as 4

6. Rename stateChangeLogMerger.scala to have 1st letter in caps.

7. StateChangeLogMerger

This tool looks great now. Thanks for including the review suggestions. Few minor observations -

7.1 You check if at least one of the two log input options are specified. I think we should also check that at most one of them is specified. In other words, what's the expected behavior if both are speci
fied ?

                
> Add state change log to Kafka brokers
> -------------------------------------
>
>                 Key: KAFKA-513
>                 URL: https://issues.apache.org/jira/browse/KAFKA-513
>             Project: Kafka
>          Issue Type: Sub-task
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Swapnil Ghike
>            Priority: Blocker
>              Labels: p1, replication, tools
>             Fix For: 0.8
>
>         Attachments: kafka-513-v1.patch, kafka-513-v2.patch, kafka-513-v3.patch, kafka-513-v4.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Once KAFKA-499 is checked in, every controller to broker communication can be modelled as a state change for one or more partitions. Every state change request will carry the controller epoch. If there is a problem with the state of some partitions, it will be good to have a tool that can create a timeline of requested and completed state changes. This will require each broker to output a state change log that has entries like
> [2012-09-10 10:06:17,280] broker 1 received request LeaderAndIsr() for partition [foo, 0] from controller 2, epoch 1
> [2012-09-10 10:06:17,350] broker 1 completed request LeaderAndIsr() for partition [foo, 0] from controller 2, epoch 1
> On controller, this will look like -
> [2012-09-10 10:06:17,198] controller 2, epoch 1, initiated state change request LeaderAndIsr() for partition [foo, 0]
> We need a tool that can collect the state change log from all brokers and create a per-partition timeline of state changes -

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

 
+
Jun Rao 2013-03-06, 18:10
+
Neha Narkhede 2013-03-06, 05:18