Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Kafka broker not respecting log.roll.hours?


+
Dan Frankowski 2013-04-25, 19:45
+
Jun Rao 2013-04-26, 04:49
+
Dan Frankowski 2013-04-26, 06:13
Copy link to this message
-
Re: Kafka broker not respecting log.roll.hours?
Yes, for low volume topic, the time-based rolling can be imprecise. Could
you file a jira and describe your suggestions there? Ideally, we should set
firstAppendTime to the file creation time. However, it doesn't seem you can
get the creation time in java.

Thanks,

Jun
On Thu, Apr 25, 2013 at 11:12 PM, Dan Frankowski <[EMAIL PROTECTED]> wrote:

> We have high-volume topics and low-volume topics. The problem occurs more
> often for low-volume topics to be sure.
>
> But if my hypothesis is correct about why it is happening, here is a case
> where rolling is longer than an hour, even on a high volume topic:
>
> - write to a topic for 20 minutes
> - restart the broker
> - wait for 5 days
> - write to a topic for 20 minutes
> - restart the broker
> - write to a topic for an hour
>
> The rollover time was now 5 days, 1 hour, 40 minutes. You can make it as
> long as you want. Does this make sense?
>
> We would like the rollover time to be no more than an hour, even if the
> broker is restarted, or the topic is low-volume.
>
> The cleanest way to do that might be to roll over on the hour no matter
> when the file was started. That would be too fast sometimes, but that's
> fine. A second way would be to embed the first append time in the file
> name. A third way (not perfect, but an approximation at least) would be to
> not to write to a segment if firstAppendTime is not defined and the
> timestamp on the file is more than an hour old. There are probably other
> ways.
>
> What say you?
>
>
> On Thu, Apr 25, 2013 at 9:49 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > That logic in 0.7.2 seems correct. Basically, firstAppendTime is set on
> > first append to a log segment. Then, later on, when a new message is
> > appended and the elapsed time since firstAppendTime is larger than the
> roll
> > time, a new segment is rolled. Is your data constantly being produced?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Apr 25, 2013 at 12:44 PM, Dan Frankowski <[EMAIL PROTECTED]>
> > wrote:
> >
> > > We are running Kafka 0.7.2. We set log.roll.hours=1. I hoped that meant
> > > logs would be rolled every hour, or more. Only, sometimes logs that are
> > > many hours (sometimes days) old have more data added to them. This
> > perturbs
> > > our systems for reasons I won't get in to.
> > >
> > > Have others observed this? Is it a bug? Is there a planned fix?
> > >
> > > I don't know Scala or Kafka well, but I have proposal for why this
> might
> > > happen: upon restart, a broker forgets when its log files have been
> > > appended to ("firstAppendTime"). Then a potentially infinite amount of
> > time
> > > later, the restarted broker receives another message for the particular
> > > (topic, partition), and starts the clock again. It will then roll over
> > that
> > > log after an hour.
> > >
> > >
> > >
> >
> https://svn.apache.org/repos/asf/kafka/branches/0.7/core/src/main/scala/kafka/server/KafkaConfig.scalasays
> > > :
> > >
> > >   /* the maximum time before a new log segment is rolled out */
> > >   val logRollHours = Utils.getIntInRange(props, "log.roll.hours", 24*7,
> > (1,
> > > Int.MaxValue))
> > >
> > >
> > >
> >
> https://svn.apache.org/repos/asf/kafka/branches/0.7/core/src/main/scala/kafka/log/Log.scalahas
> > > maybeRoll, which needs segment.firstAppendTime defined. It also has
> > > updateFirstAppendTime() which says if it's empty, then set it.
> > >
> >
>

 
+
Dan Frankowski 2013-04-26, 15:20
+
Jason Rosenberg 2013-04-26, 16:52
+
Adam Talaat 2013-04-26, 17:33
+
Dan Frankowski 2013-04-27, 21:37
+
Dan Frankowski 2013-05-02, 21:23
+
Jun Rao 2013-05-03, 15:58