Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # dev - [jira] [Commented] (KAFKA-559) Garbage collect old consumer metadata entries

Copy link to this message
[jira] [Commented] (KAFKA-559) Garbage collect old consumer metadata entries
"Joel Koshy 2013-07-09, 07:20

    [ https://issues.apache.org/jira/browse/KAFKA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703007#comment-13703007 ]

Joel Koshy commented on KAFKA-559:

Thanks for the patch. Overall, looks good.  Couple of comments, mostly minor
in no particular order:

* I think dry-run does not need any further qualifier such as withOptionalArg, describedAs, ofType - it's just a flag.
* For --since, I prefer the seconds since epoch over some fixed input format which then brings in ambiguity such as timezone 24h vs 12h, etc. A better alternative would be to accept date strings and use  the DateFormat class with lenient parsing turned on or something like that. --before may be more intuitive than --since.
* Can use CommandLineUtils.checkRequiredArgs
* deleteBy matching - prefer to use case match and thereby avoid the explicit check on valid values. Also the message on invalid value of deleteBy should inform what the valid values are.
* Right now you support the following modes: delete stale topics across all groups, delete stale topics in a specific group. I think it would be useful to make deleteBy optional - if unspecified, it scans all groups and gets rid of stale groups.
* line 75: warn ("msg", e)
* line 101: should provide a reason for aborting
* line 110: doesn't gropudirs have an offset path? if not maybe we should add it
* Logging should include last mtime as that may be useful information reported by the dry-run
* No need to add a wrapper shell script for the tool.
* make all of the methods except main private.
* The return statements can be dropped - i.e., just write the return value.
* Several vars can be vals instead.
* removeBrokerPartitionpairs: I don't think you would want to do a partial delete under a topic directory. You can check that all the partition offset paths are <= since and if so, just delete the topic path. With that the method would be better named something like deleteUnconsumedTopicsFromGroup?
* Finally, you are probably aware that there are a bunch of race conditions - e.g., checkIfLiveConsumers is a helpful check to have but not guaranteed to be correct as some consumers may creep in while the tool is running. However, I think it is reasonable for a tool like this to ignore that since a "since" value way back would mean the probability of that occuring is very low. Similar note for deleteGroupIfNoTopicExists.

> Garbage collect old consumer metadata entries
> ---------------------------------------------
>                 Key: KAFKA-559
>                 URL: https://issues.apache.org/jira/browse/KAFKA-559
>             Project: Kafka
>          Issue Type: New Feature
>            Reporter: Jay Kreps
>            Assignee: Tejas Patil
>              Labels: project
>         Attachments: KAFKA-559.v1.patch, KAFKA-559.v2.patch
> Many use cases involve tranient consumers. These consumers create entries under their consumer group in zk and maintain offsets there as well. There is currently no way to delete these entries. It would be good to have a tool that did something like
>   bin/delete-obsolete-consumer-groups.sh [--topic t1] --since [date] --zookeeper [zk_connect]
> This would scan through consumer group entries and delete any that had no offset update since the given date.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira