Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # dev - Slow behaviour on replay from FileChannel


Copy link to this message
-
Re: Slow behaviour on replay from FileChannel
Mike Percy 2013-06-23, 10:10
Good point Hari. I think a log file snippet would help to clarify.

Mike
On Sun, Jun 23, 2013 at 3:05 AM, Hari Shreedharan <[EMAIL PROTECTED]
> wrote:

> This was likely due to the checkpoint being corrupt and automatically being
> cleaned up causing a replay of all your files. Can you try enabling dual
> checkpoints (you will need to use trunk or the upcoming 1.4 release for
> this feature though).
>
> Hari
>
> On Sunday, June 23, 2013, Mike Percy wrote:
>
> > Edward,
> > Someone told me they saw similar behavior but that it seemed
> intermittent /
> > not consistent. I haven't seen this, typically the FC is very fast with
> > replay. Any update on this?
> >
> > Thanks,
> > Mike
> >
> >
> >
> > On Mon, Jun 17, 2013 at 3:53 PM, Edward Sargisson <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi all,
> > > This may be a user question so feel free to punt me to that list.
> > However,
> > > I've just seen behaviour which seems mighty slow and I don't understand
> > > why.
> > >
> > > I restarted one of our Flume agents and it took about 23 minutes before
> > it
> > > was ready to accept new events. The logs seem to indicate that it took
> > the
> > > majority of that time to workthrough the data file that only had 6885
> > > events in it. This seems mighty slow to me.
> > >
> > > Does anybody have an explanation for this? Is there something I should
> do
> > > in the future to bring it back up faster? I looked at the code and
> > there's
> > > nothing obviously slow about it.
> > >
> > > Many thanks,
> > > Edward
> > >
> > > Log snippet (filtered to be only this thread and large number of
> Pending
> > > take messages removed):
> > > 2013-06-17 21:53:20,154  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.FileChannel Starting FileChannel troubleshootingFileChannel {
> > > dataDirs: [/var/local/flume/troubleshooting-file-channel/data] }...
> > > 2013-06-17 21:53:20,155  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.Log Encryption is not enabled
> > > 2013-06-17 21:53:20,155  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.Log Replay started
> > > 2013-06-17 21:53:20,165  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.Log Found NextFileID 20, from
> > > [/var/local/flume/troubleshooting-file-channel/data/log-20,
> > > /var/local/flume/troubleshooting-file-channel/data/log-19]
> > > 2013-06-17 21:53:20,172  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.EventQueueBackingStoreFileV3 Starting up with
> > > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint and
> > >
> /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta
> > > 2013-06-17 21:53:20,172  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.EventQueueBackingStoreFileV3 Reading checkpoint metadata from
> > >
> /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta
> > > 2013-06-17 21:53:20,213  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.Log Last Checkpoint Mon Jun 17 21:04:26 UTC 2013, queue depth
> > = 0
> > > 2013-06-17 21:53:20,222  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.Log Replaying logs with v2 replay logic
> > > 2013-06-17 21:53:20,225  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.ReplayHandler Starting replay of
> > > [/var/local/flume/troubleshooting-file-channel/data/log-19,
> > > /var/local/flume/troubleshooting-file-channel/data/log-20]
> > > 2013-06-17 21:53:20,226  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.ReplayHandler Replaying
> > > /var/local/flume/troubleshooting-file-channel/data/log-19
> > > 2013-06-17 21:53:20,275  WARN [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.LogFile Checkpoint for
> > > file(/var/local/flume/troubleshooting-file-channel/data/log-19) is:
> > > 1371488755062, which is beyond the requested checkpoint time: 0 and
> > > position 284327361
> > > 2013-06-17 21:53:20,287  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.ReplayHandler Replaying
> > > /var/local/flume/troubleshooting-file-channel/data/log-20
> > > 2013-06-17 21:53:20,288  WARN [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.LogFile Checkpoint for