|
Taylor Gautier
2011-11-18, 05:03
Inder Pall
2011-11-18, 05:15
Taylor Gautier
2011-11-18, 05:20
Jun Rao
2011-11-18, 06:57
Inder Pall
2011-11-18, 06:59
Jun Rao
2011-11-18, 07:01
Taylor Gautier
2011-11-18, 15:02
Jun Rao
2011-11-18, 16:04
Taylor Gautier
2011-11-18, 16:19
Jun Rao
2011-11-18, 16:50
Taylor Gautier
2011-11-18, 16:52
Evan Chan
2011-11-18, 17:40
Jun Rao
2011-11-18, 18:02
Jun Rao
2011-11-18, 18:06
Taylor Gautier
2011-11-18, 18:07
Jun Rao
2011-11-18, 18:32
Taylor Gautier
2011-11-18, 19:50
Joel Koshy
2011-11-18, 23:39
Taylor Gautier
2011-11-19, 15:32
Jun Rao
2011-11-19, 17:17
Taylor Gautier
2011-11-19, 19:02
Chris Burroughs
2011-11-23, 16:22
Jun Rao
2011-11-23, 17:40
Taylor Gautier
2011-11-23, 19:53
|
-
the cleaner and log segmentsTaylor Gautier 2011-11-18, 05:03
Hi,
We've noticed that the cleaner script in Kafka removes empty log segments but not the directories themselves. I am actually wondering something - I always assumed that Kafka could restore the latest offset for existing topics by scanning the log directory for all directories and scanning the directories for log segment files to restore the latest offset. Now this conclusion I have made simply by observation - so it could be entirely wrong. My question is however - if I am right, and the cleaner removes all the log segments for a given topic so that a given topic directory is empty, how does Kafka behave when restarted? How does it know what the next offset should be?
-
Re: the cleaner and log segmentsInder Pall 2011-11-18, 05:15
The consumer offsets are stored in ZooKeeper by topic and partition.
That's how in a consumer fail over scenario you don't get duplicate messages - Inder On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier <[EMAIL PROTECTED]>wrote: > Hi, > > We've noticed that the cleaner script in Kafka removes empty log segments > but not the directories themselves. I am actually wondering something - I > always assumed that Kafka could restore the latest offset for existing > topics by scanning the log directory for all directories and scanning the > directories for log segment files to restore the latest offset. > > Now this conclusion I have made simply by observation - so it could be > entirely wrong. > > My question is however - if I am right, and the cleaner removes all the log > segments for a given topic so that a given topic directory is empty, how > does Kafka behave when restarted? How does it know what the next offset > should be? > -- -- Inder
-
Re: the cleaner and log segmentsTaylor Gautier 2011-11-18, 05:20
hmmm - and if you turn off zookeeper?
On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall <[EMAIL PROTECTED]> wrote: > The consumer offsets are stored in ZooKeeper by topic and partition. > That's how in a consumer fail over scenario you don't get duplicate > messages > > - Inder > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier <[EMAIL PROTECTED] > >wrote: > > > Hi, > > > > We've noticed that the cleaner script in Kafka removes empty log segments > > but not the directories themselves. I am actually wondering something - > I > > always assumed that Kafka could restore the latest offset for existing > > topics by scanning the log directory for all directories and scanning the > > directories for log segment files to restore the latest offset. > > > > Now this conclusion I have made simply by observation - so it could be > > entirely wrong. > > > > My question is however - if I am right, and the cleaner removes all the > log > > segments for a given topic so that a given topic directory is empty, how > > does Kafka behave when restarted? How does it know what the next offset > > should be? > > > > > > -- > -- Inder >
-
Re: the cleaner and log segmentsJun Rao 2011-11-18, 06:57
Taylor,
When you start a consumer, it always tries to get the last checkpointed offset from ZK. If no offset can be found in ZK, the consumer starts from either the smallest or the largest available offset in the broker. Thanks, Jun On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier <[EMAIL PROTECTED]> wrote: > hmmm - and if you turn off zookeeper? > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall <[EMAIL PROTECTED]> wrote: > > > The consumer offsets are stored in ZooKeeper by topic and partition. > > That's how in a consumer fail over scenario you don't get duplicate > > messages > > > > - Inder > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier <[EMAIL PROTECTED] > > >wrote: > > > > > Hi, > > > > > > We've noticed that the cleaner script in Kafka removes empty log > segments > > > but not the directories themselves. I am actually wondering something > - > > I > > > always assumed that Kafka could restore the latest offset for existing > > > topics by scanning the log directory for all directories and scanning > the > > > directories for log segment files to restore the latest offset. > > > > > > Now this conclusion I have made simply by observation - so it could be > > > entirely wrong. > > > > > > My question is however - if I am right, and the cleaner removes all the > > log > > > segments for a given topic so that a given topic directory is empty, > how > > > does Kafka behave when restarted? How does it know what the next > offset > > > should be? > > > > > > > > > > > -- > > -- Inder > > >
-
Re: the cleaner and log segmentsInder Pall 2011-11-18, 06:59
Jun & Taylor,
would it be right to say that consumers without ZK won't be a viable option if you can't handle replay of old messages in your application. - inder On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > Taylor, > > When you start a consumer, it always tries to get the last checkpointed > offset from ZK. If no offset can be found in ZK, the consumer starts from > either the smallest or the largest available offset in the broker. > > Thanks, > > Jun > > On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier <[EMAIL PROTECTED]> > wrote: > > > hmmm - and if you turn off zookeeper? > > > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall <[EMAIL PROTECTED]> > wrote: > > > > > The consumer offsets are stored in ZooKeeper by topic and partition. > > > That's how in a consumer fail over scenario you don't get duplicate > > > messages > > > > > > - Inder > > > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier <[EMAIL PROTECTED] > > > >wrote: > > > > > > > Hi, > > > > > > > > We've noticed that the cleaner script in Kafka removes empty log > > segments > > > > but not the directories themselves. I am actually wondering > something > > - > > > I > > > > always assumed that Kafka could restore the latest offset for > existing > > > > topics by scanning the log directory for all directories and scanning > > the > > > > directories for log segment files to restore the latest offset. > > > > > > > > Now this conclusion I have made simply by observation - so it could > be > > > > entirely wrong. > > > > > > > > My question is however - if I am right, and the cleaner removes all > the > > > log > > > > segments for a given topic so that a given topic directory is empty, > > how > > > > does Kafka behave when restarted? How does it know what the next > > offset > > > > should be? > > > > > > > > > > > > > > > > -- > > > -- Inder > > > > > > -- -- Inder
-
Re: the cleaner and log segmentsJun Rao 2011-11-18, 07:01
This is true for the high-level ZK-based consumer.
Jun On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <[EMAIL PROTECTED]> wrote: > Jun & Taylor, > would it be right to say that consumers without ZK won't be a viable option > if you can't handle replay of old messages in your application. > > - inder > > On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > Taylor, > > > > When you start a consumer, it always tries to get the last checkpointed > > offset from ZK. If no offset can be found in ZK, the consumer starts from > > either the smallest or the largest available offset in the broker. > > > > Thanks, > > > > Jun > > > > On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier <[EMAIL PROTECTED]> > > wrote: > > > > > hmmm - and if you turn off zookeeper? > > > > > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall <[EMAIL PROTECTED]> > > wrote: > > > > > > > The consumer offsets are stored in ZooKeeper by topic and partition. > > > > That's how in a consumer fail over scenario you don't get duplicate > > > > messages > > > > > > > > - Inder > > > > > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier < > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > Hi, > > > > > > > > > > We've noticed that the cleaner script in Kafka removes empty log > > > segments > > > > > but not the directories themselves. I am actually wondering > > something > > > - > > > > I > > > > > always assumed that Kafka could restore the latest offset for > > existing > > > > > topics by scanning the log directory for all directories and > scanning > > > the > > > > > directories for log segment files to restore the latest offset. > > > > > > > > > > Now this conclusion I have made simply by observation - so it could > > be > > > > > entirely wrong. > > > > > > > > > > My question is however - if I am right, and the cleaner removes all > > the > > > > log > > > > > segments for a given topic so that a given topic directory is > empty, > > > how > > > > > does Kafka behave when restarted? How does it know what the next > > > offset > > > > > should be? > > > > > > > > > > > > > > > > > > > > > -- > > > > -- Inder > > > > > > > > > > > > > -- > -- Inder >
-
Re: the cleaner and log segmentsTaylor Gautier 2011-11-18, 15:02
I don't use high level consumers - just low level. What I was thinking was
the following. Let's assume I have turned off ZK in my setup. 1) Send 1 message to topic A. Kafka creates a directory and log segment for A. The log segment starts at 0. Now, the "last offset" of the topic is a. 2) A consumer reads from topic A the message, and records that the most recent offset in topic A is a. 3) Much time passes, the cleaner runs, and deletes the log segment 4) More time passes, I restart Kafka. Kafka sees the topic A directory, but has no segment file to initialize from. So the "last offset" is considered to be 0. 5) Send 1 message to topic A. Kafka creates a log segment for A starting at 0. The new last offset of the topic is a'. 6) The consumer from step 2 tries to read from Kafka at offset a, but this is now an invalid offset. Does that sound right? I haven't tried this yet, I'm just doing a thought experiment here to try to figure out what would happen. On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > This is true for the high-level ZK-based consumer. > > Jun > > On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <[EMAIL PROTECTED]> wrote: > > > Jun & Taylor, > > would it be right to say that consumers without ZK won't be a viable > option > > if you can't handle replay of old messages in your application. > > > > - inder > > > > On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > > > Taylor, > > > > > > When you start a consumer, it always tries to get the last checkpointed > > > offset from ZK. If no offset can be found in ZK, the consumer starts > from > > > either the smallest or the largest available offset in the broker. > > > > > > Thanks, > > > > > > Jun > > > > > > On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier <[EMAIL PROTECTED]> > > > wrote: > > > > > > > hmmm - and if you turn off zookeeper? > > > > > > > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > The consumer offsets are stored in ZooKeeper by topic and > partition. > > > > > That's how in a consumer fail over scenario you don't get duplicate > > > > > messages > > > > > > > > > > - Inder > > > > > > > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier < > > [EMAIL PROTECTED] > > > > > >wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > We've noticed that the cleaner script in Kafka removes empty log > > > > segments > > > > > > but not the directories themselves. I am actually wondering > > > something > > > > - > > > > > I > > > > > > always assumed that Kafka could restore the latest offset for > > > existing > > > > > > topics by scanning the log directory for all directories and > > scanning > > > > the > > > > > > directories for log segment files to restore the latest offset. > > > > > > > > > > > > Now this conclusion I have made simply by observation - so it > could > > > be > > > > > > entirely wrong. > > > > > > > > > > > > My question is however - if I am right, and the cleaner removes > all > > > the > > > > > log > > > > > > segments for a given topic so that a given topic directory is > > empty, > > > > how > > > > > > does Kafka behave when restarted? How does it know what the next > > > > offset > > > > > > should be? > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -- Inder > > > > > > > > > > > > > > > > > > > > -- > > -- Inder > > >
-
Re: the cleaner and log segmentsJun Rao 2011-11-18, 16:04
4) is incorrect. "Last offset" remains to be 'a' even after the data is
cleaned. So in 5), the offset will be 2 x 'a'. That is, we never recycle offsets. They keep increasing. Thanks, Jun On Fri, Nov 18, 2011 at 7:02 AM, Taylor Gautier <[EMAIL PROTECTED]> wrote: > I don't use high level consumers - just low level. What I was thinking was > the following. Let's assume I have turned off ZK in my setup. > > 1) Send 1 message to topic A. Kafka creates a directory and log segment > for A. The log segment starts at 0. Now, the "last offset" of the topic > is a. > > 2) A consumer reads from topic A the message, and records that the most > recent offset in topic A is a. > > 3) Much time passes, the cleaner runs, and deletes the log segment > > 4) More time passes, I restart Kafka. Kafka sees the topic A directory, > but has no segment file to initialize from. So the "last offset" is > considered to be 0. > > 5) Send 1 message to topic A. Kafka creates a log segment for A starting > at 0. The new last offset of the topic is a'. > > 6) The consumer from step 2 tries to read from Kafka at offset a, but this > is now an invalid offset. > > Does that sound right? I haven't tried this yet, I'm just doing a thought > experiment here to try to figure out what would happen. > > > > > On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > This is true for the high-level ZK-based consumer. > > > > Jun > > > > On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <[EMAIL PROTECTED]> > wrote: > > > > > Jun & Taylor, > > > would it be right to say that consumers without ZK won't be a viable > > option > > > if you can't handle replay of old messages in your application. > > > > > > - inder > > > > > > On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > > > > > Taylor, > > > > > > > > When you start a consumer, it always tries to get the last > checkpointed > > > > offset from ZK. If no offset can be found in ZK, the consumer starts > > from > > > > either the smallest or the largest available offset in the broker. > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier <[EMAIL PROTECTED] > > > > > > wrote: > > > > > > > > > hmmm - and if you turn off zookeeper? > > > > > > > > > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > > > > The consumer offsets are stored in ZooKeeper by topic and > > partition. > > > > > > That's how in a consumer fail over scenario you don't get > duplicate > > > > > > messages > > > > > > > > > > > > - Inder > > > > > > > > > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier < > > > [EMAIL PROTECTED] > > > > > > >wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > We've noticed that the cleaner script in Kafka removes empty > log > > > > > segments > > > > > > > but not the directories themselves. I am actually wondering > > > > something > > > > > - > > > > > > I > > > > > > > always assumed that Kafka could restore the latest offset for > > > > existing > > > > > > > topics by scanning the log directory for all directories and > > > scanning > > > > > the > > > > > > > directories for log segment files to restore the latest offset. > > > > > > > > > > > > > > Now this conclusion I have made simply by observation - so it > > could > > > > be > > > > > > > entirely wrong. > > > > > > > > > > > > > > My question is however - if I am right, and the cleaner removes > > all > > > > the > > > > > > log > > > > > > > segments for a given topic so that a given topic directory is > > > empty, > > > > > how > > > > > > > does Kafka behave when restarted? How does it know what the > next > > > > > offset > > > > > > > should be? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > -- Inder > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > -- Inder > > > > > >
-
Re: the cleaner and log segmentsTaylor Gautier 2011-11-18, 16:19
how? where is the information kept? If ZK is not around, and it's not on
disk, how is this information passed to the next process after the restart? On Fri, Nov 18, 2011 at 8:04 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > 4) is incorrect. "Last offset" remains to be 'a' even after the data is > cleaned. So in 5), the offset will be 2 x 'a'. That is, we never recycle > offsets. They keep increasing. > > Thanks, > > Jun > > On Fri, Nov 18, 2011 at 7:02 AM, Taylor Gautier <[EMAIL PROTECTED]> > wrote: > > > I don't use high level consumers - just low level. What I was thinking > was > > the following. Let's assume I have turned off ZK in my setup. > > > > 1) Send 1 message to topic A. Kafka creates a directory and log segment > > for A. The log segment starts at 0. Now, the "last offset" of the > topic > > is a. > > > > 2) A consumer reads from topic A the message, and records that the most > > recent offset in topic A is a. > > > > 3) Much time passes, the cleaner runs, and deletes the log segment > > > > 4) More time passes, I restart Kafka. Kafka sees the topic A directory, > > but has no segment file to initialize from. So the "last offset" is > > considered to be 0. > > > > 5) Send 1 message to topic A. Kafka creates a log segment for A starting > > at 0. The new last offset of the topic is a'. > > > > 6) The consumer from step 2 tries to read from Kafka at offset a, but > this > > is now an invalid offset. > > > > Does that sound right? I haven't tried this yet, I'm just doing a > thought > > experiment here to try to figure out what would happen. > > > > > > > > > > On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > > > This is true for the high-level ZK-based consumer. > > > > > > Jun > > > > > > On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <[EMAIL PROTECTED]> > > wrote: > > > > > > > Jun & Taylor, > > > > would it be right to say that consumers without ZK won't be a viable > > > option > > > > if you can't handle replay of old messages in your application. > > > > > > > > - inder > > > > > > > > On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > > > > > > > Taylor, > > > > > > > > > > When you start a consumer, it always tries to get the last > > checkpointed > > > > > offset from ZK. If no offset can be found in ZK, the consumer > starts > > > from > > > > > either the smallest or the largest available offset in the broker. > > > > > > > > > > Thanks, > > > > > > > > > > Jun > > > > > > > > > > On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier < > [EMAIL PROTECTED] > > > > > > > > wrote: > > > > > > > > > > > hmmm - and if you turn off zookeeper? > > > > > > > > > > > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall < > [EMAIL PROTECTED]> > > > > > wrote: > > > > > > > > > > > > > The consumer offsets are stored in ZooKeeper by topic and > > > partition. > > > > > > > That's how in a consumer fail over scenario you don't get > > duplicate > > > > > > > messages > > > > > > > > > > > > > > - Inder > > > > > > > > > > > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier < > > > > [EMAIL PROTECTED] > > > > > > > >wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > We've noticed that the cleaner script in Kafka removes empty > > log > > > > > > segments > > > > > > > > but not the directories themselves. I am actually wondering > > > > > something > > > > > > - > > > > > > > I > > > > > > > > always assumed that Kafka could restore the latest offset for > > > > > existing > > > > > > > > topics by scanning the log directory for all directories and > > > > scanning > > > > > > the > > > > > > > > directories for log segment files to restore the latest > offset. > > > > > > > > > > > > > > > > Now this conclusion I have made simply by observation - so it > > > could > > > > > be > > > > > > > > entirely wrong. > > > > > > > > > > > > > > > > My question is however - if I am right, and the cleaner > removes > > > all > > > > > the > > > > > > > log
-
Re: the cleaner and log segmentsJun Rao 2011-11-18, 16:50
What I described is what happens in the broker. If you use SimpleConsumer,
then it's the consumer's responsibility to remember the last offset. The server doesn't store the state for consumers. Thanks, Jun On Fri, Nov 18, 2011 at 8:19 AM, Taylor Gautier <[EMAIL PROTECTED]> wrote: > how? where is the information kept? If ZK is not around, and it's not on > disk, how is this information passed to the next process after the restart? > > On Fri, Nov 18, 2011 at 8:04 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > 4) is incorrect. "Last offset" remains to be 'a' even after the data is > > cleaned. So in 5), the offset will be 2 x 'a'. That is, we never recycle > > offsets. They keep increasing. > > > > Thanks, > > > > Jun > > > > On Fri, Nov 18, 2011 at 7:02 AM, Taylor Gautier <[EMAIL PROTECTED]> > > wrote: > > > > > I don't use high level consumers - just low level. What I was thinking > > was > > > the following. Let's assume I have turned off ZK in my setup. > > > > > > 1) Send 1 message to topic A. Kafka creates a directory and log > segment > > > for A. The log segment starts at 0. Now, the "last offset" of the > > topic > > > is a. > > > > > > 2) A consumer reads from topic A the message, and records that the most > > > recent offset in topic A is a. > > > > > > 3) Much time passes, the cleaner runs, and deletes the log segment > > > > > > 4) More time passes, I restart Kafka. Kafka sees the topic A > directory, > > > but has no segment file to initialize from. So the "last offset" is > > > considered to be 0. > > > > > > 5) Send 1 message to topic A. Kafka creates a log segment for A > starting > > > at 0. The new last offset of the topic is a'. > > > > > > 6) The consumer from step 2 tries to read from Kafka at offset a, but > > this > > > is now an invalid offset. > > > > > > Does that sound right? I haven't tried this yet, I'm just doing a > > thought > > > experiment here to try to figure out what would happen. > > > > > > > > > > > > > > > On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > > > > > This is true for the high-level ZK-based consumer. > > > > > > > > Jun > > > > > > > > On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > Jun & Taylor, > > > > > would it be right to say that consumers without ZK won't be a > viable > > > > option > > > > > if you can't handle replay of old messages in your application. > > > > > > > > > > - inder > > > > > > > > > > On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <[EMAIL PROTECTED]> > wrote: > > > > > > > > > > > Taylor, > > > > > > > > > > > > When you start a consumer, it always tries to get the last > > > checkpointed > > > > > > offset from ZK. If no offset can be found in ZK, the consumer > > starts > > > > from > > > > > > either the smallest or the largest available offset in the > broker. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jun > > > > > > > > > > > > On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier < > > [EMAIL PROTECTED] > > > > > > > > > > wrote: > > > > > > > > > > > > > hmmm - and if you turn off zookeeper? > > > > > > > > > > > > > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall < > > [EMAIL PROTECTED]> > > > > > > wrote: > > > > > > > > > > > > > > > The consumer offsets are stored in ZooKeeper by topic and > > > > partition. > > > > > > > > That's how in a consumer fail over scenario you don't get > > > duplicate > > > > > > > > messages > > > > > > > > > > > > > > > > - Inder > > > > > > > > > > > > > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier < > > > > > [EMAIL PROTECTED] > > > > > > > > >wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > We've noticed that the cleaner script in Kafka removes > empty > > > log > > > > > > > segments > > > > > > > > > but not the directories themselves. I am actually > wondering > > > > > > something > > > > > > > - > > > > > > > > I > > > > > > > > > always assumed that Kafka could restore the latest offset
-
Re: the cleaner and log segmentsTaylor Gautier 2011-11-18, 16:52
Right. I'm talking about the broker. Where does it store what is the
most recent offset if there are no log segments? And no ZK. On Nov 18, 2011, at 8:50 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > What I described is what happens in the broker. If you use SimpleConsumer, > then it's the consumer's responsibility to remember the last offset. The > server doesn't store the state for consumers. > > Thanks, > > Jun > > On Fri, Nov 18, 2011 at 8:19 AM, Taylor Gautier <[EMAIL PROTECTED]> wrote: > >> how? where is the information kept? If ZK is not around, and it's not on >> disk, how is this information passed to the next process after the restart? >> >> On Fri, Nov 18, 2011 at 8:04 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >> >>> 4) is incorrect. "Last offset" remains to be 'a' even after the data is >>> cleaned. So in 5), the offset will be 2 x 'a'. That is, we never recycle >>> offsets. They keep increasing. >>> >>> Thanks, >>> >>> Jun >>> >>> On Fri, Nov 18, 2011 at 7:02 AM, Taylor Gautier <[EMAIL PROTECTED]> >>> wrote: >>> >>>> I don't use high level consumers - just low level. What I was thinking >>> was >>>> the following. Let's assume I have turned off ZK in my setup. >>>> >>>> 1) Send 1 message to topic A. Kafka creates a directory and log >> segment >>>> for A. The log segment starts at 0. Now, the "last offset" of the >>> topic >>>> is a. >>>> >>>> 2) A consumer reads from topic A the message, and records that the most >>>> recent offset in topic A is a. >>>> >>>> 3) Much time passes, the cleaner runs, and deletes the log segment >>>> >>>> 4) More time passes, I restart Kafka. Kafka sees the topic A >> directory, >>>> but has no segment file to initialize from. So the "last offset" is >>>> considered to be 0. >>>> >>>> 5) Send 1 message to topic A. Kafka creates a log segment for A >> starting >>>> at 0. The new last offset of the topic is a'. >>>> >>>> 6) The consumer from step 2 tries to read from Kafka at offset a, but >>> this >>>> is now an invalid offset. >>>> >>>> Does that sound right? I haven't tried this yet, I'm just doing a >>> thought >>>> experiment here to try to figure out what would happen. >>>> >>>> >>>> >>>> >>>> On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <[EMAIL PROTECTED]> wrote: >>>> >>>>> This is true for the high-level ZK-based consumer. >>>>> >>>>> Jun >>>>> >>>>> On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <[EMAIL PROTECTED]> >>>> wrote: >>>>> >>>>>> Jun & Taylor, >>>>>> would it be right to say that consumers without ZK won't be a >> viable >>>>> option >>>>>> if you can't handle replay of old messages in your application. >>>>>> >>>>>> - inder >>>>>> >>>>>> On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <[EMAIL PROTECTED]> >> wrote: >>>>>> >>>>>>> Taylor, >>>>>>> >>>>>>> When you start a consumer, it always tries to get the last >>>> checkpointed >>>>>>> offset from ZK. If no offset can be found in ZK, the consumer >>> starts >>>>> from >>>>>>> either the smallest or the largest available offset in the >> broker. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Jun >>>>>>> >>>>>>> On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier < >>> [EMAIL PROTECTED] >>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> hmmm - and if you turn off zookeeper? >>>>>>>> >>>>>>>> On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall < >>> [EMAIL PROTECTED]> >>>>>>> wrote: >>>>>>>> >>>>>>>>> The consumer offsets are stored in ZooKeeper by topic and >>>>> partition. >>>>>>>>> That's how in a consumer fail over scenario you don't get >>>> duplicate >>>>>>>>> messages >>>>>>>>> >>>>>>>>> - Inder >>>>>>>>> >>>>>>>>> On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier < >>>>>> [EMAIL PROTECTED] >>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> We've noticed that the cleaner script in Kafka removes >> empty >>>> log >>>>>>>> segments >>>>>>>>>> but not the directories themselves. I am actually >> wondering >>>>>>> something >>>>>>>> - >>>>>>>>> I >>>>>>>>>> always assumed that Kafka could restore the latest offset >> for >>>
-
Re: the cleaner and log segmentsEvan Chan 2011-11-18, 17:40
Jun,
How do offsets keep increasing? Eventually they have to rollover back to 0, right? What happens if Kafka runs for months, eventually the offset rolls back, right? On Fri, Nov 18, 2011 at 8:04 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > 4) is incorrect. "Last offset" remains to be 'a' even after the data is > cleaned. So in 5), the offset will be 2 x 'a'. That is, we never recycle > offsets. They keep increasing. > > Thanks, > > Jun > > On Fri, Nov 18, 2011 at 7:02 AM, Taylor Gautier <[EMAIL PROTECTED]> > wrote: > > > I don't use high level consumers - just low level. What I was thinking > was > > the following. Let's assume I have turned off ZK in my setup. > > > > 1) Send 1 message to topic A. Kafka creates a directory and log segment > > for A. The log segment starts at 0. Now, the "last offset" of the > topic > > is a. > > > > 2) A consumer reads from topic A the message, and records that the most > > recent offset in topic A is a. > > > > 3) Much time passes, the cleaner runs, and deletes the log segment > > > > 4) More time passes, I restart Kafka. Kafka sees the topic A directory, > > but has no segment file to initialize from. So the "last offset" is > > considered to be 0. > > > > 5) Send 1 message to topic A. Kafka creates a log segment for A starting > > at 0. The new last offset of the topic is a'. > > > > 6) The consumer from step 2 tries to read from Kafka at offset a, but > this > > is now an invalid offset. > > > > Does that sound right? I haven't tried this yet, I'm just doing a > thought > > experiment here to try to figure out what would happen. > > > > > > > > > > On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > > > This is true for the high-level ZK-based consumer. > > > > > > Jun > > > > > > On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <[EMAIL PROTECTED]> > > wrote: > > > > > > > Jun & Taylor, > > > > would it be right to say that consumers without ZK won't be a viable > > > option > > > > if you can't handle replay of old messages in your application. > > > > > > > > - inder > > > > > > > > On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > > > > > > > Taylor, > > > > > > > > > > When you start a consumer, it always tries to get the last > > checkpointed > > > > > offset from ZK. If no offset can be found in ZK, the consumer > starts > > > from > > > > > either the smallest or the largest available offset in the broker. > > > > > > > > > > Thanks, > > > > > > > > > > Jun > > > > > > > > > > On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier < > [EMAIL PROTECTED] > > > > > > > > wrote: > > > > > > > > > > > hmmm - and if you turn off zookeeper? > > > > > > > > > > > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall < > [EMAIL PROTECTED]> > > > > > wrote: > > > > > > > > > > > > > The consumer offsets are stored in ZooKeeper by topic and > > > partition. > > > > > > > That's how in a consumer fail over scenario you don't get > > duplicate > > > > > > > messages > > > > > > > > > > > > > > - Inder > > > > > > > > > > > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier < > > > > [EMAIL PROTECTED] > > > > > > > >wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > We've noticed that the cleaner script in Kafka removes empty > > log > > > > > > segments > > > > > > > > but not the directories themselves. I am actually wondering > > > > > something > > > > > > - > > > > > > > I > > > > > > > > always assumed that Kafka could restore the latest offset for > > > > > existing > > > > > > > > topics by scanning the log directory for all directories and > > > > scanning > > > > > > the > > > > > > > > directories for log segment files to restore the latest > offset. > > > > > > > > > > > > > > > > Now this conclusion I have made simply by observation - so it > > > could > > > > > be > > > > > > > > entirely wrong. > > > > > > > > > > > > > > > > My question is however - if I am right, and the cleaner > removes > > > all *Evan Chan* Senior Software Engineer | [EMAIL PROTECTED] | (650) 996-4600 www.ooyala.com | blog <http://www.ooyala.com/blog> | @ooyala<http://www.twitter.com/ooyala>
-
Re: the cleaner and log segmentsJun Rao 2011-11-18, 18:02
In the broker, the name of each log file contains the offset of the first
message in that file. So the last offset can be computed by filename + filelength. Jun On Fri, Nov 18, 2011 at 8:52 AM, Taylor Gautier <[EMAIL PROTECTED]> wrote: > Right. I'm talking about the broker. Where does it store what is the > most recent offset if there are no log segments? And no ZK. > > > > On Nov 18, 2011, at 8:50 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > What I described is what happens in the broker. If you use > SimpleConsumer, > > then it's the consumer's responsibility to remember the last offset. The > > server doesn't store the state for consumers. > > > > Thanks, > > > > Jun > > > > On Fri, Nov 18, 2011 at 8:19 AM, Taylor Gautier <[EMAIL PROTECTED]> > wrote: > > > >> how? where is the information kept? If ZK is not around, and it's not > on > >> disk, how is this information passed to the next process after the > restart? > >> > >> On Fri, Nov 18, 2011 at 8:04 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > >>> 4) is incorrect. "Last offset" remains to be 'a' even after the data is > >>> cleaned. So in 5), the offset will be 2 x 'a'. That is, we never > recycle > >>> offsets. They keep increasing. > >>> > >>> Thanks, > >>> > >>> Jun > >>> > >>> On Fri, Nov 18, 2011 at 7:02 AM, Taylor Gautier <[EMAIL PROTECTED]> > >>> wrote: > >>> > >>>> I don't use high level consumers - just low level. What I was > thinking > >>> was > >>>> the following. Let's assume I have turned off ZK in my setup. > >>>> > >>>> 1) Send 1 message to topic A. Kafka creates a directory and log > >> segment > >>>> for A. The log segment starts at 0. Now, the "last offset" of the > >>> topic > >>>> is a. > >>>> > >>>> 2) A consumer reads from topic A the message, and records that the > most > >>>> recent offset in topic A is a. > >>>> > >>>> 3) Much time passes, the cleaner runs, and deletes the log segment > >>>> > >>>> 4) More time passes, I restart Kafka. Kafka sees the topic A > >> directory, > >>>> but has no segment file to initialize from. So the "last offset" is > >>>> considered to be 0. > >>>> > >>>> 5) Send 1 message to topic A. Kafka creates a log segment for A > >> starting > >>>> at 0. The new last offset of the topic is a'. > >>>> > >>>> 6) The consumer from step 2 tries to read from Kafka at offset a, but > >>> this > >>>> is now an invalid offset. > >>>> > >>>> Does that sound right? I haven't tried this yet, I'm just doing a > >>> thought > >>>> experiment here to try to figure out what would happen. > >>>> > >>>> > >>>> > >>>> > >>>> On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > >>>> > >>>>> This is true for the high-level ZK-based consumer. > >>>>> > >>>>> Jun > >>>>> > >>>>> On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <[EMAIL PROTECTED]> > >>>> wrote: > >>>>> > >>>>>> Jun & Taylor, > >>>>>> would it be right to say that consumers without ZK won't be a > >> viable > >>>>> option > >>>>>> if you can't handle replay of old messages in your application. > >>>>>> > >>>>>> - inder > >>>>>> > >>>>>> On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <[EMAIL PROTECTED]> > >> wrote: > >>>>>> > >>>>>>> Taylor, > >>>>>>> > >>>>>>> When you start a consumer, it always tries to get the last > >>>> checkpointed > >>>>>>> offset from ZK. If no offset can be found in ZK, the consumer > >>> starts > >>>>> from > >>>>>>> either the smallest or the largest available offset in the > >> broker. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> Jun > >>>>>>> > >>>>>>> On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier < > >>> [EMAIL PROTECTED] > >>>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> hmmm - and if you turn off zookeeper? > >>>>>>>> > >>>>>>>> On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall < > >>> [EMAIL PROTECTED]> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>>> The consumer offsets are stored in ZooKeeper by topic and > >>>>> partition. > >>>>>>>>> That's how in a consumer fail over scenario you don't get > >>>> duplicate > >>>>>>>>> messages
-
Re: the cleaner and log segmentsJun Rao 2011-11-18, 18:06
Evan,
We don't roll back offset at this moment. Since the offset is a long, it can last for a really long time. If you write 1TB a day, you can keep going for about 4 million days. Plus, you can always use more partitions (each partition has its own offset). Thanks, Jun On Fri, Nov 18, 2011 at 9:40 AM, Evan Chan <[EMAIL PROTECTED]> wrote: > Jun, > > How do offsets keep increasing? Eventually they have to rollover back to > 0, right? What happens if Kafka runs for months, eventually the offset > rolls back, right? > > On Fri, Nov 18, 2011 at 8:04 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > 4) is incorrect. "Last offset" remains to be 'a' even after the data is > > cleaned. So in 5), the offset will be 2 x 'a'. That is, we never recycle > > offsets. They keep increasing. > > > > Thanks, > > > > Jun > > > > On Fri, Nov 18, 2011 at 7:02 AM, Taylor Gautier <[EMAIL PROTECTED]> > > wrote: > > > > > I don't use high level consumers - just low level. What I was thinking > > was > > > the following. Let's assume I have turned off ZK in my setup. > > > > > > 1) Send 1 message to topic A. Kafka creates a directory and log > segment > > > for A. The log segment starts at 0. Now, the "last offset" of the > > topic > > > is a. > > > > > > 2) A consumer reads from topic A the message, and records that the most > > > recent offset in topic A is a. > > > > > > 3) Much time passes, the cleaner runs, and deletes the log segment > > > > > > 4) More time passes, I restart Kafka. Kafka sees the topic A > directory, > > > but has no segment file to initialize from. So the "last offset" is > > > considered to be 0. > > > > > > 5) Send 1 message to topic A. Kafka creates a log segment for A > starting > > > at 0. The new last offset of the topic is a'. > > > > > > 6) The consumer from step 2 tries to read from Kafka at offset a, but > > this > > > is now an invalid offset. > > > > > > Does that sound right? I haven't tried this yet, I'm just doing a > > thought > > > experiment here to try to figure out what would happen. > > > > > > > > > > > > > > > On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > > > > > This is true for the high-level ZK-based consumer. > > > > > > > > Jun > > > > > > > > On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > Jun & Taylor, > > > > > would it be right to say that consumers without ZK won't be a > viable > > > > option > > > > > if you can't handle replay of old messages in your application. > > > > > > > > > > - inder > > > > > > > > > > On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <[EMAIL PROTECTED]> > wrote: > > > > > > > > > > > Taylor, > > > > > > > > > > > > When you start a consumer, it always tries to get the last > > > checkpointed > > > > > > offset from ZK. If no offset can be found in ZK, the consumer > > starts > > > > from > > > > > > either the smallest or the largest available offset in the > broker. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jun > > > > > > > > > > > > On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier < > > [EMAIL PROTECTED] > > > > > > > > > > wrote: > > > > > > > > > > > > > hmmm - and if you turn off zookeeper? > > > > > > > > > > > > > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall < > > [EMAIL PROTECTED]> > > > > > > wrote: > > > > > > > > > > > > > > > The consumer offsets are stored in ZooKeeper by topic and > > > > partition. > > > > > > > > That's how in a consumer fail over scenario you don't get > > > duplicate > > > > > > > > messages > > > > > > > > > > > > > > > > - Inder > > > > > > > > > > > > > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier < > > > > > [EMAIL PROTECTED] > > > > > > > > >wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > We've noticed that the cleaner script in Kafka removes > empty > > > log > > > > > > > segments > > > > > > > > > but not the directories themselves. I am actually > wondering > > > > > > something
-
Re: the cleaner and log segmentsTaylor Gautier 2011-11-18, 18:07
Well interestingly enough I just checked the logs, and the problem I was
sort of thinking might happen already did. Here it is: [2011-11-18 09:31:52,255] INFO Deleting log segment 00000000000000016226.kafka from cards_card_1185934476-0 (kafka.log.LogManager) [2011-11-18 09:31:52,255] WARN Delete failed. (kafka.log.LogManager) [2011-11-18 09:31:52,255] INFO Deleting log segment 00000000000000000026.kafka from healthCheck1320643480188-0 (kafka.log.LogManager) [2011-11-18 09:31:52,255] INFO Deleting log segment 00000000000000000028.kafka from healthCheck1319860947508-0 (kafka.log.LogManager) [2011-11-18 09:31:52,255] ERROR error when processing request topic:cards_card_1185934476, part:0 offset:16226 maxSize:1048576 kafka.common.OffsetOutOfRangeException: offset 16226 is out of rangekafka.common.OffsetOutOfRangeException: offset 16226 is out of range at kafka.log.Log$.findRange(Log.scala:47) at kafka.log.Log.read(Log.scala:223) at kafka.server.KafkaRequestHandlers.kafka$server$KafkaRequestHandlers$$readMessageSet(KafkaRequestHandlers.scala:125) at kafka.server.KafkaRequestHandlers.handleFetchRequest(KafkaRequestHandlers.scala:107) at kafka.server.KafkaRequestHandlers$$anonfun$handlerFor$2.apply(KafkaRequestHandlers.scala:42) at kafka.server.KafkaRequestHandlers$$anonfun$handlerFor$2.apply(KafkaRequestHandlers.scala:42) at kafka.network.Processor.handle(SocketServer.scala:268) at kafka.network.Processor.read(SocketServer.scala:291) at kafka.network.Processor.run(SocketServer.scala:202) at java.lang.Thread.run(Thread.java:619) (kafka.server.KafkaRequestHandlers) you see the issue? The consumer had previously read messages up to offset 16226. The cleaner came and took out the segment in the directory so there are no more segments. The consumer came and asked for the offset 16226 and it's now invalid. I had previously thought this might occur only after a restart but it appears to happen even without a restart. On Fri, Nov 18, 2011 at 8:52 AM, Taylor Gautier <[EMAIL PROTECTED]> wrote: > Right. I'm talking about the broker. Where does it store what is the > most recent offset if there are no log segments? And no ZK. > > > > On Nov 18, 2011, at 8:50 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > What I described is what happens in the broker. If you use > SimpleConsumer, > > then it's the consumer's responsibility to remember the last offset. The > > server doesn't store the state for consumers. > > > > Thanks, > > > > Jun > > > > On Fri, Nov 18, 2011 at 8:19 AM, Taylor Gautier <[EMAIL PROTECTED]> > wrote: > > > >> how? where is the information kept? If ZK is not around, and it's not > on > >> disk, how is this information passed to the next process after the > restart? > >> > >> On Fri, Nov 18, 2011 at 8:04 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > >>> 4) is incorrect. "Last offset" remains to be 'a' even after the data is > >>> cleaned. So in 5), the offset will be 2 x 'a'. That is, we never > recycle > >>> offsets. They keep increasing. > >>> > >>> Thanks, > >>> > >>> Jun > >>> > >>> On Fri, Nov 18, 2011 at 7:02 AM, Taylor Gautier <[EMAIL PROTECTED]> > >>> wrote: > >>> > >>>> I don't use high level consumers - just low level. What I was > thinking > >>> was > >>>> the following. Let's assume I have turned off ZK in my setup. > >>>> > >>>> 1) Send 1 message to topic A. Kafka creates a directory and log > >> segment > >>>> for A. The log segment starts at 0. Now, the "last offset" of the > >>> topic > >>>> is a. > >>>> > >>>> 2) A consumer reads from topic A the message, and records that the > most > >>>> recent offset in topic A is a. > >>>> > >>>> 3) Much time passes, the cleaner runs, and deletes the log segment > >>>> > >>>> 4) More time passes, I restart Kafka. Kafka sees the topic A > >> directory, > >>>> but has no segment file to initialize from. So the "last offset" is > >>>> considered to be 0. > >>>> > >>>> 5) Send 1 message to topic A. Kafka creates a log segment for A
-
Re: the cleaner and log segmentsJun Rao 2011-11-18, 18:32
Taylor,
If you request an offset whose corresponding log file has been deleted, you will get OutOfRange exception. When this happens, you can use the getLatestOffset api in SimpleConsumer to obtain either the current valid smallest or largest offset and reconsume from there. Our high level consumer does that for you (among many other things). That's why we encourage most users to use the high level api instead. Thanks, Jun
-
Re: the cleaner and log segmentsTaylor Gautier 2011-11-18, 19:50
Ok that's what we are already doing. In essence when that happens it
is a bit like a rollover. Except depending on the values it might be the case that a consumer has a low enough value that web it requests the topic the value is still within range but is not valid since messages were delivered to the broker. Essentially it's a race condition that might be somewhat hard to induce but is theoretically possible. During a rollover of 64-bits this is more or less never going to happen because 64-bits is just too large to open a time window long enough for the race to occur. On Nov 18, 2011, at 10:32 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > Taylor, > > If you request an offset whose corresponding log file has been deleted, you > will get OutOfRange exception. When this happens, you can use the > getLatestOffset api in SimpleConsumer to obtain either the current valid > smallest or largest offset and reconsume from there. Our high level > consumer does that for you (among many other things). That's why we > encourage most users to use the high level api instead. > > Thanks, > > Jun
-
Re: the cleaner and log segmentsJoel Koshy 2011-11-18, 23:39
Just want to see if I understand this right - when the log cleaner
does its thing, even if all the segments are eligible for garbage collection the cleaner will nuke those files and should deposit an empty segment file named with the next valid offset in that partition. I think Taylor encountered a case where that empty segment was not added. Is this the race condition that you speak of? If for e.g., the broker crashes before that empty segment file is created... Also, I have seen the log cleaner act up more than once in the past - basically seems to get scheduled continuously and delete file 0000... I think someone else on the list saw that before. I have been unable to reproduce that though - and it is not impossible that there was a misconfiguration at play. Thanks, Joel On Fri, Nov 18, 2011 at 11:50 AM, Taylor Gautier <[EMAIL PROTECTED]> wrote: > Ok that's what we are already doing. In essence when that happens it > is a bit like a rollover. Except depending on the values it might be > the case that a consumer has a low enough value that web it requests > the topic the value is still within range but is not valid since > messages were delivered to the broker. Essentially it's a race > condition that might be somewhat hard to induce but is theoretically > possible. During a rollover of 64-bits this is more or less never > going to happen because 64-bits is just too large to open a time > window long enough for the race to occur. > > > > On Nov 18, 2011, at 10:32 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> Taylor, >> >> If you request an offset whose corresponding log file has been deleted, you >> will get OutOfRange exception. When this happens, you can use the >> getLatestOffset api in SimpleConsumer to obtain either the current valid >> smallest or largest offset and reconsume from there. Our high level >> consumer does that for you (among many other things). That's why we >> encourage most users to use the high level api instead. >> >> Thanks, >> >> Jun >
-
Re: the cleaner and log segmentsTaylor Gautier 2011-11-19, 15:32
Oh - well if that's what is supposed to happen - it's not. I don't
think it's not happening because of a race condition. It seems to be intentional that it is just removing the segment file and not creating anything because it is a fairly consistent behavior. Note that I'm using 0.6. On Nov 18, 2011, at 3:40 PM, Joel Koshy <[EMAIL PROTECTED]> wrote: > Just want to see if I understand this right - when the log cleaner > does its thing, even if all the segments are eligible for garbage > collection the cleaner will nuke those files and should deposit an > empty segment file named with the next valid offset in that partition. > I think Taylor encountered a case where that empty segment was not > added. Is this the race condition that you speak of? If for e.g., the > broker crashes before that empty segment file is created... > > Also, I have seen the log cleaner act up more than once in the past - > basically seems to get scheduled continuously and delete file 0000... > I think someone else on the list saw that before. I have been unable > to reproduce that though - and it is not impossible that there was a > misconfiguration at play. > > Thanks, > > Joel > > On Fri, Nov 18, 2011 at 11:50 AM, Taylor Gautier <[EMAIL PROTECTED]> wrote: >> Ok that's what we are already doing. In essence when that happens it >> is a bit like a rollover. Except depending on the values it might be >> the case that a consumer has a low enough value that web it requests >> the topic the value is still within range but is not valid since >> messages were delivered to the broker. Essentially it's a race >> condition that might be somewhat hard to induce but is theoretically >> possible. During a rollover of 64-bits this is more or less never >> going to happen because 64-bits is just too large to open a time >> window long enough for the race to occur. >> >> >> >> On Nov 18, 2011, at 10:32 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >> >>> Taylor, >>> >>> If you request an offset whose corresponding log file has been deleted, you >>> will get OutOfRange exception. When this happens, you can use the >>> getLatestOffset api in SimpleConsumer to obtain either the current valid >>> smallest or largest offset and reconsume from there. Our high level >>> consumer does that for you (among many other things). That's why we >>> encourage most users to use the high level api instead. >>> >>> Thanks, >>> >>> Jun >>
-
Re: the cleaner and log segmentsJun Rao 2011-11-19, 17:17
Could you try the 0.7 RC?
Thanks, Jun On Sat, Nov 19, 2011 at 7:32 AM, Taylor Gautier <[EMAIL PROTECTED]> wrote: > Oh - well if that's what is supposed to happen - it's not. I don't > think it's not happening because of a race condition. It seems to be > intentional that it is just removing the segment file and not creating > anything because it is a fairly consistent behavior. > > Note that I'm using 0.6. > > On Nov 18, 2011, at 3:40 PM, Joel Koshy <[EMAIL PROTECTED]> wrote: > > > Just want to see if I understand this right - when the log cleaner > > does its thing, even if all the segments are eligible for garbage > > collection the cleaner will nuke those files and should deposit an > > empty segment file named with the next valid offset in that partition. > > I think Taylor encountered a case where that empty segment was not > > added. Is this the race condition that you speak of? If for e.g., the > > broker crashes before that empty segment file is created... > > > > Also, I have seen the log cleaner act up more than once in the past - > > basically seems to get scheduled continuously and delete file 0000... > > I think someone else on the list saw that before. I have been unable > > to reproduce that though - and it is not impossible that there was a > > misconfiguration at play. > > > > Thanks, > > > > Joel > > > > On Fri, Nov 18, 2011 at 11:50 AM, Taylor Gautier <[EMAIL PROTECTED]> > wrote: > >> Ok that's what we are already doing. In essence when that happens it > >> is a bit like a rollover. Except depending on the values it might be > >> the case that a consumer has a low enough value that web it requests > >> the topic the value is still within range but is not valid since > >> messages were delivered to the broker. Essentially it's a race > >> condition that might be somewhat hard to induce but is theoretically > >> possible. During a rollover of 64-bits this is more or less never > >> going to happen because 64-bits is just too large to open a time > >> window long enough for the race to occur. > >> > >> > >> > >> On Nov 18, 2011, at 10:32 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > >>> Taylor, > >>> > >>> If you request an offset whose corresponding log file has been > deleted, you > >>> will get OutOfRange exception. When this happens, you can use the > >>> getLatestOffset api in SimpleConsumer to obtain either the current > valid > >>> smallest or largest offset and reconsume from there. Our high level > >>> consumer does that for you (among many other things). That's why we > >>> encourage most users to use the high level api instead. > >>> > >>> Thanks, > >>> > >>> Jun > >> >
-
Re: the cleaner and log segmentsTaylor Gautier 2011-11-19, 19:02
At some point I will - right now we are unfortunately stuck with 0.6. The
biggest problem being that the message format changed for the binary protocol, and I would have to upgrade my clients just to try it out. On Sat, Nov 19, 2011 at 9:17 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > Could you try the 0.7 RC? > > Thanks, > > Jun > > On Sat, Nov 19, 2011 at 7:32 AM, Taylor Gautier <[EMAIL PROTECTED]> > wrote: > > > Oh - well if that's what is supposed to happen - it's not. I don't > > think it's not happening because of a race condition. It seems to be > > intentional that it is just removing the segment file and not creating > > anything because it is a fairly consistent behavior. > > > > Note that I'm using 0.6. > > > > On Nov 18, 2011, at 3:40 PM, Joel Koshy <[EMAIL PROTECTED]> wrote: > > > > > Just want to see if I understand this right - when the log cleaner > > > does its thing, even if all the segments are eligible for garbage > > > collection the cleaner will nuke those files and should deposit an > > > empty segment file named with the next valid offset in that partition. > > > I think Taylor encountered a case where that empty segment was not > > > added. Is this the race condition that you speak of? If for e.g., the > > > broker crashes before that empty segment file is created... > > > > > > Also, I have seen the log cleaner act up more than once in the past - > > > basically seems to get scheduled continuously and delete file 0000... > > > I think someone else on the list saw that before. I have been unable > > > to reproduce that though - and it is not impossible that there was a > > > misconfiguration at play. > > > > > > Thanks, > > > > > > Joel > > > > > > On Fri, Nov 18, 2011 at 11:50 AM, Taylor Gautier <[EMAIL PROTECTED]> > > wrote: > > >> Ok that's what we are already doing. In essence when that happens it > > >> is a bit like a rollover. Except depending on the values it might be > > >> the case that a consumer has a low enough value that web it requests > > >> the topic the value is still within range but is not valid since > > >> messages were delivered to the broker. Essentially it's a race > > >> condition that might be somewhat hard to induce but is theoretically > > >> possible. During a rollover of 64-bits this is more or less never > > >> going to happen because 64-bits is just too large to open a time > > >> window long enough for the race to occur. > > >> > > >> > > >> > > >> On Nov 18, 2011, at 10:32 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > >> > > >>> Taylor, > > >>> > > >>> If you request an offset whose corresponding log file has been > > deleted, you > > >>> will get OutOfRange exception. When this happens, you can use the > > >>> getLatestOffset api in SimpleConsumer to obtain either the current > > valid > > >>> smallest or largest offset and reconsume from there. Our high level > > >>> consumer does that for you (among many other things). That's why we > > >>> encourage most users to use the high level api instead. > > >>> > > >>> Thanks, > > >>> > > >>> Jun > > >> > > >
-
Re: the cleaner and log segmentsChris Burroughs 2011-11-23, 16:22
Was that "write an empty log segment" feature always there?
On 11/18/2011 06:39 PM, Joel Koshy wrote: > Just want to see if I understand this right - when the log cleaner > does its thing, even if all the segments are eligible for garbage > collection the cleaner will nuke those files and should deposit an > empty segment file named with the next valid offset in that partition. > I think Taylor encountered a case where that empty segment was not > added. Is this the race condition that you speak of? If for e.g., the > broker crashes before that empty segment file is created... > > Also, I have seen the log cleaner act up more than once in the past - > basically seems to get scheduled continuously and delete file 0000... > I think someone else on the list saw that before. I have been unable > to reproduce that though - and it is not impossible that there was a > misconfiguration at play. > > Thanks, > > Joel > > On Fri, Nov 18, 2011 at 11:50 AM, Taylor Gautier <[EMAIL PROTECTED]> wrote: >> Ok that's what we are already doing. In essence when that happens it >> is a bit like a rollover. Except depending on the values it might be >> the case that a consumer has a low enough value that web it requests >> the topic the value is still within range but is not valid since >> messages were delivered to the broker. Essentially it's a race >> condition that might be somewhat hard to induce but is theoretically >> possible. During a rollover of 64-bits this is more or less never >> going to happen because 64-bits is just too large to open a time >> window long enough for the race to occur. >> >> >> >> On Nov 18, 2011, at 10:32 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >> >>> Taylor, >>> >>> If you request an offset whose corresponding log file has been deleted, you >>> will get OutOfRange exception. When this happens, you can use the >>> getLatestOffset api in SimpleConsumer to obtain either the current valid >>> smallest or largest offset and reconsume from there. Our high level >>> consumer does that for you (among many other things). That's why we >>> encourage most users to use the high level api instead. >>> >>> Thanks, >>> >>> Jun >>
-
Re: the cleaner and log segmentsJun Rao 2011-11-23, 17:40
Yes.
Jun On Wed, Nov 23, 2011 at 8:22 AM, Chris Burroughs <[EMAIL PROTECTED]>wrote: > Was that "write an empty log segment" feature always there? > > On 11/18/2011 06:39 PM, Joel Koshy wrote: > > Just want to see if I understand this right - when the log cleaner > > does its thing, even if all the segments are eligible for garbage > > collection the cleaner will nuke those files and should deposit an > > empty segment file named with the next valid offset in that partition. > > I think Taylor encountered a case where that empty segment was not > > added. Is this the race condition that you speak of? If for e.g., the > > broker crashes before that empty segment file is created... > > > > Also, I have seen the log cleaner act up more than once in the past - > > basically seems to get scheduled continuously and delete file 0000... > > I think someone else on the list saw that before. I have been unable > > to reproduce that though - and it is not impossible that there was a > > misconfiguration at play. > > > > Thanks, > > > > Joel > > > > On Fri, Nov 18, 2011 at 11:50 AM, Taylor Gautier <[EMAIL PROTECTED]> > wrote: > >> Ok that's what we are already doing. In essence when that happens it > >> is a bit like a rollover. Except depending on the values it might be > >> the case that a consumer has a low enough value that web it requests > >> the topic the value is still within range but is not valid since > >> messages were delivered to the broker. Essentially it's a race > >> condition that might be somewhat hard to induce but is theoretically > >> possible. During a rollover of 64-bits this is more or less never > >> going to happen because 64-bits is just too large to open a time > >> window long enough for the race to occur. > >> > >> > >> > >> On Nov 18, 2011, at 10:32 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > >>> Taylor, > >>> > >>> If you request an offset whose corresponding log file has been > deleted, you > >>> will get OutOfRange exception. When this happens, you can use the > >>> getLatestOffset api in SimpleConsumer to obtain either the current > valid > >>> smallest or largest offset and reconsume from there. Our high level > >>> consumer does that for you (among many other things). That's why we > >>> encourage most users to use the high level api instead. > >>> > >>> Thanks, > >>> > >>> Jun > >> > >
-
Re: the cleaner and log segmentsTaylor Gautier 2011-11-23, 19:53
Hmm…it *definitely* does not work right in 0.6. We actually take advantage
of it to clean up dead topics. Our current use case is very different from what kafka was designed for - we have hundreds of thousands of topics that individually get very little traffic. As you can surmise - not making topics on read (KAFKA-101) was a very important feature for this use case. On Wed, Nov 23, 2011 at 9:40 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > Yes. > > Jun > > On Wed, Nov 23, 2011 at 8:22 AM, Chris Burroughs > <[EMAIL PROTECTED]>wrote: > > > Was that "write an empty log segment" feature always there? > > > > On 11/18/2011 06:39 PM, Joel Koshy wrote: > > > Just want to see if I understand this right - when the log cleaner > > > does its thing, even if all the segments are eligible for garbage > > > collection the cleaner will nuke those files and should deposit an > > > empty segment file named with the next valid offset in that partition. > > > I think Taylor encountered a case where that empty segment was not > > > added. Is this the race condition that you speak of? If for e.g., the > > > broker crashes before that empty segment file is created... > > > > > > Also, I have seen the log cleaner act up more than once in the past - > > > basically seems to get scheduled continuously and delete file 0000... > > > I think someone else on the list saw that before. I have been unable > > > to reproduce that though - and it is not impossible that there was a > > > misconfiguration at play. > > > > > > Thanks, > > > > > > Joel > > > > > > On Fri, Nov 18, 2011 at 11:50 AM, Taylor Gautier <[EMAIL PROTECTED]> > > wrote: > > >> Ok that's what we are already doing. In essence when that happens it > > >> is a bit like a rollover. Except depending on the values it might be > > >> the case that a consumer has a low enough value that web it requests > > >> the topic the value is still within range but is not valid since > > >> messages were delivered to the broker. Essentially it's a race > > >> condition that might be somewhat hard to induce but is theoretically > > >> possible. During a rollover of 64-bits this is more or less never > > >> going to happen because 64-bits is just too large to open a time > > >> window long enough for the race to occur. > > >> > > >> > > >> > > >> On Nov 18, 2011, at 10:32 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > >> > > >>> Taylor, > > >>> > > >>> If you request an offset whose corresponding log file has been > > deleted, you > > >>> will get OutOfRange exception. When this happens, you can use the > > >>> getLatestOffset api in SimpleConsumer to obtain either the current > > valid > > >>> smallest or largest offset and reconsume from there. Our high level > > >>> consumer does that for you (among many other things). That's why we > > >>> encourage most users to use the high level api instead. > > >>> > > >>> Thanks, > > >>> > > >>> Jun > > >> > > > > > |