|
|
-
Kafka Broker Configuration Tuning and Repartitioning topic
Muthukumar 2012-11-19, 17:34
Hi,
We're currently having 1 broker to 10 consumers setup pointing to 5 ZK node. Load would be in 500GB data / day, and following is the generic configuration we're using as,
brokerid=0 port=9092 num.threads=8 socket.send.buffer=1048576 socket.receive.buffer=1048576 max.socket.request.bytes=104857600 log.dir=/data/kafka/server/logs num.partitions=16 topic.partition.count.map=topic1:16,topic2:16,topic3:16,topic4:16,topic5:16,topic6:16,topic7:46,topic8:16 log.flush.interval=10000 log.default.flush.interval.ms=1000 log.default.flush.scheduler.interval.ms=1000 log.retention.hours=48 log.file.size=536870912 log.cleanup.interval.mins=1 enable.zookeeper=true zk.connectiontimeout.ms=1000000
1) Please let us know if this configuration is holds good or needs a change? 2) We're seeing a huge backlog of 320GB for topic6 partitioned for 46. Is it possible to re-partition since 21 partition were newly added and data lag is almost 0 there.
Let me know if more details are required. And thanks for your great help.
-Muthu
+
Muthukumar 2012-11-19, 17:34
-
Re: Kafka Broker Configuration Tuning and Repartitioning topic
Jun Rao 2012-11-20, 05:39
The configs look reasonable. Currently, we don't repartition existing data. Only new messages will consider the newly added partitions.
Thanks
Jun
On Mon, Nov 19, 2012 at 9:34 AM, Muthukumar <[EMAIL PROTECTED]> wrote:
> Hi, > > We're currently having 1 broker to 10 consumers setup pointing to 5 ZK > node. Load would be in 500GB data / day, and following is the generic > configuration we're using as, > > brokerid=0 > port=9092 > num.threads=8 > socket.send.buffer=1048576 > socket.receive.buffer=1048576 > max.socket.request.bytes=104857600 > log.dir=/data/kafka/server/logs > num.partitions=16 > > topic.partition.count.map=topic1:16,topic2:16,topic3:16,topic4:16,topic5:16,topic6:16,topic7:46,topic8:16 > log.flush.interval=10000 > log.default.flush.interval.ms=1000 > log.default.flush.scheduler.interval.ms=1000 > log.retention.hours=48 > log.file.size=536870912 > log.cleanup.interval.mins=1 > enable.zookeeper=true > zk.connectiontimeout.ms=1000000 > > 1) Please let us know if this configuration is holds good or needs a > change? > 2) We're seeing a huge backlog of 320GB for topic6 partitioned for 46. > Is it possible to re-partition since 21 partition were newly added and > data lag is almost 0 there. > > Let me know if more details are required. And thanks for your great help. > > -Muthu >
+
Jun Rao 2012-11-20, 05:39
-
Re: Kafka Broker Configuration Tuning and Repartitioning topic
Muthukumar 2012-11-20, 11:12
Hi Jun,
Thanks for the response.
a) Is there any plan in the roadmap to address this re-partition or partition balance with new partitions? Please let me know to have the JIRA for this.
b) Do we need to go for more partitions for the topic6 (46 to ??) to reduce the new requests + backlog.
-Muthu
On Tue, Nov 20, 2012 at 11:09 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > The configs look reasonable. Currently, we don't repartition existing data. > Only new messages will consider the newly added partitions. > > Thanks > > Jun >
+
Muthukumar 2012-11-20, 11:12
-
Re: Kafka Broker Configuration Tuning and Repartitioning topic
Neha Narkhede 2012-11-20, 16:21
Muthu,
a) Not as of now. Please feel free to create the JIRA and specify the details there
b) I doubt increasing partitions will help. 500 GB/day/topic suggests the data per partition is only 10 GB/day. Before thinking about increasing the # of partitions, I would try a few things-
1. Inspect the consumer throughput metrics through the mbeans exposed on the Kafka consumers. 2. If individual consumer throughput looks reasonable, then deploy more consumer instances and see if that helps. Since you have 40-50 partitions per topic, you can have at least those many consumer instances. 3. If not, then check if the consumers post-process the data consumed from these partitions. If this processing is slow, your consumption rate will reduce.
Thanks, Neha
On Tue, Nov 20, 2012 at 3:12 AM, Muthukumar <[EMAIL PROTECTED]> wrote: > Hi Jun, > > Thanks for the response. > > a) Is there any plan in the roadmap to address this re-partition or > partition balance with new partitions? Please let me know to have the > JIRA for this. > > b) Do we need to go for more partitions for the topic6 (46 to ??) to > reduce the new requests + backlog. > > -Muthu > > On Tue, Nov 20, 2012 at 11:09 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >> The configs look reasonable. Currently, we don't repartition existing data. >> Only new messages will consider the newly added partitions. >> >> Thanks >> >> Jun >>
+
Neha Narkhede 2012-11-20, 16:21
-
Re: Kafka Broker Configuration Tuning and Repartitioning topic
Muthukumar 2012-11-20, 19:17
Hi Neha,
Thanks for the response, and we're currently working to integrate with mbeans exposed with collectors and monitor it.
It will be great to know if we've not having support of repartition, can we move the files in one partition to another to pick-up? Will that work.
Noads-8: total 9886868 -rw-r--r--. 1 root root 536871644 Nov 18 20:38 00000000037581033857.kafka -rw-r--r--. 1 root root 536871327 Nov 18 22:01 00000000038117905501.kafka -rw-r--r--. 1 root root 536871525 Nov 18 23:22 00000000038654776828.kafka -rw-r--r--. 1 root root 536871520 Nov 19 00:40 00000000039191648353.kafka -rw-r--r--. 1 root root 536871057 Nov 19 01:56 00000000039728519873.kafka
Noads-9: total 9891380 -rw-r--r--. 1 root root 536871893 Nov 18 20:37 00000000037581035640.kafka -rw-r--r--. 1 root root 536871274 Nov 18 22:00 00000000038117907533.kafka -rw-r--r--. 1 root root 536872062 Nov 18 23:21 00000000038654778807.kafka -rw-r--r--. 1 root root 536872190 Nov 19 00:40 00000000039191650869.kafka
If we move one of the in partition#9 of Noads to partition#8, will that work. Thanks.
-Muthu
On Tue, Nov 20, 2012 at 9:51 PM, Neha Narkhede <[EMAIL PROTECTED]> wrote: > Muthu, > > a) Not as of now. Please feel free to create the JIRA and specify the > details there > > b) I doubt increasing partitions will help. 500 GB/day/topic suggests > the data per partition is only 10 GB/day. Before thinking about > increasing the # of partitions, I would try a few things- > > 1. Inspect the consumer throughput metrics through the mbeans exposed > on the Kafka consumers. > 2. If individual consumer throughput looks reasonable, then deploy > more consumer instances and see if that helps. Since you have 40-50 > partitions per topic, you can have at least those many consumer > instances. > 3. If not, then check if the consumers post-process the data consumed > from these partitions. If this processing is slow, your consumption > rate will reduce. > > Thanks, > Neha > > On Tue, Nov 20, 2012 at 3:12 AM, Muthukumar <[EMAIL PROTECTED]> wrote: >> Hi Jun, >> >> Thanks for the response. >> >> a) Is there any plan in the roadmap to address this re-partition or >> partition balance with new partitions? Please let me know to have the >> JIRA for this. >> >> b) Do we need to go for more partitions for the topic6 (46 to ??) to >> reduce the new requests + backlog. >> >> -Muthu >> >> On Tue, Nov 20, 2012 at 11:09 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >>> The configs look reasonable. Currently, we don't repartition existing data. >>> Only new messages will consider the newly added partitions. >>> >>> Thanks >>> >>> Jun >>>
-- Mail: [EMAIL PROTECTED] / [EMAIL PROTECTED] | Phone: +91-94436-62936 (Chennai) / +91-96207-89253 (Bangalore)
+
Muthukumar 2012-11-20, 19:17
-
Re: Kafka Broker Configuration Tuning and Repartitioning topic
Jay Kreps 2012-11-20, 20:37
I think this may be a terminology issue. By "re-partitioning" I think Neha means taking data currently on disk and splitting it into a different number of partitions on different servers. We can't really do this because the partition function is something computed on the client.
A different issue is migrating partitions to different servers, that will be supported. This is the standard kind of over-partitioning setup you would expect in many distributed system (i.e. you create up front a fixed number of partitions which doesn't change, but you can move them around).
Another issue is changing the total number of partitions for a topic. This will eventually be supported, though maybe not in 0.8 iiuc. You would do this if you wanted more parallelism in the topic. Even though we wouldn't go back and retrofit data into the new partitions, that is probably fine as data would naturally cycle out as it falls out of the retention period.
-Jay On Tue, Nov 20, 2012 at 11:17 AM, Muthukumar <[EMAIL PROTECTED]> wrote:
> Hi Neha, > > Thanks for the response, and we're currently working to integrate with > mbeans exposed with collectors and monitor it. > > It will be great to know if we've not having support of repartition, > can we move the files in one partition to another to pick-up? Will > that work. > > Noads-8: > total 9886868 > -rw-r--r--. 1 root root 536871644 Nov 18 20:38 00000000037581033857.kafka > -rw-r--r--. 1 root root 536871327 Nov 18 22:01 00000000038117905501.kafka > -rw-r--r--. 1 root root 536871525 Nov 18 23:22 00000000038654776828.kafka > -rw-r--r--. 1 root root 536871520 Nov 19 00:40 00000000039191648353.kafka > -rw-r--r--. 1 root root 536871057 Nov 19 01:56 00000000039728519873.kafka > > Noads-9: > total 9891380 > -rw-r--r--. 1 root root 536871893 Nov 18 20:37 00000000037581035640.kafka > -rw-r--r--. 1 root root 536871274 Nov 18 22:00 00000000038117907533.kafka > -rw-r--r--. 1 root root 536872062 Nov 18 23:21 00000000038654778807.kafka > -rw-r--r--. 1 root root 536872190 Nov 19 00:40 00000000039191650869.kafka > > If we move one of the in partition#9 of Noads to partition#8, will > that work. Thanks. > > -Muthu > > On Tue, Nov 20, 2012 at 9:51 PM, Neha Narkhede <[EMAIL PROTECTED]> > wrote: > > Muthu, > > > > a) Not as of now. Please feel free to create the JIRA and specify the > > details there > > > > b) I doubt increasing partitions will help. 500 GB/day/topic suggests > > the data per partition is only 10 GB/day. Before thinking about > > increasing the # of partitions, I would try a few things- > > > > 1. Inspect the consumer throughput metrics through the mbeans exposed > > on the Kafka consumers. > > 2. If individual consumer throughput looks reasonable, then deploy > > more consumer instances and see if that helps. Since you have 40-50 > > partitions per topic, you can have at least those many consumer > > instances. > > 3. If not, then check if the consumers post-process the data consumed > > from these partitions. If this processing is slow, your consumption > > rate will reduce. > > > > Thanks, > > Neha > > > > On Tue, Nov 20, 2012 at 3:12 AM, Muthukumar <[EMAIL PROTECTED]> wrote: > >> Hi Jun, > >> > >> Thanks for the response. > >> > >> a) Is there any plan in the roadmap to address this re-partition or > >> partition balance with new partitions? Please let me know to have the > >> JIRA for this. > >> > >> b) Do we need to go for more partitions for the topic6 (46 to ??) to > >> reduce the new requests + backlog. > >> > >> -Muthu > >> > >> On Tue, Nov 20, 2012 at 11:09 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >>> The configs look reasonable. Currently, we don't repartition existing > data. > >>> Only new messages will consider the newly added partitions. > >>> > >>> Thanks > >>> > >>> Jun > >>> > > > > -- > Mail: [EMAIL PROTECTED] / [EMAIL PROTECTED] | Phone: > +91-94436-62936 (Chennai) / +91-96207-89253 (Bangalore) >
+
Jay Kreps 2012-11-20, 20:37
|
|