Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Slow behaviour on replay from FileChannel


Copy link to this message
-
Re: Slow behaviour on replay from FileChannel
This was likely due to the checkpoint being corrupt and automatically being
cleaned up causing a replay of all your files. Can you try enabling dual
checkpoints (you will need to use trunk or the upcoming 1.4 release for
this feature though).

Hari

On Sunday, June 23, 2013, Mike Percy wrote:

> Edward,
> Someone told me they saw similar behavior but that it seemed intermittent /
> not consistent. I haven't seen this, typically the FC is very fast with
> replay. Any update on this?
>
> Thanks,
> Mike
>
>
>
> On Mon, Jun 17, 2013 at 3:53 PM, Edward Sargisson <[EMAIL PROTECTED]>
> wrote:
>
> > Hi all,
> > This may be a user question so feel free to punt me to that list.
> However,
> > I've just seen behaviour which seems mighty slow and I don't understand
> > why.
> >
> > I restarted one of our Flume agents and it took about 23 minutes before
> it
> > was ready to accept new events. The logs seem to indicate that it took
> the
> > majority of that time to workthrough the data file that only had 6885
> > events in it. This seems mighty slow to me.
> >
> > Does anybody have an explanation for this? Is there something I should do
> > in the future to bring it back up faster? I looked at the code and
> there's
> > nothing obviously slow about it.
> >
> > Many thanks,
> > Edward
> >
> > Log snippet (filtered to be only this thread and large number of Pending
> > take messages removed):
> > 2013-06-17 21:53:20,154  INFO [lifecycleSupervisor-1-1]
> > o.a.f.c.f.FileChannel Starting FileChannel troubleshootingFileChannel {
> > dataDirs: [/var/local/flume/troubleshooting-file-channel/data] }...
> > 2013-06-17 21:53:20,155  INFO [lifecycleSupervisor-1-1]
> > o.a.f.c.f.Log Encryption is not enabled
> > 2013-06-17 21:53:20,155  INFO [lifecycleSupervisor-1-1]
> > o.a.f.c.f.Log Replay started
> > 2013-06-17 21:53:20,165  INFO [lifecycleSupervisor-1-1]
> > o.a.f.c.f.Log Found NextFileID 20, from
> > [/var/local/flume/troubleshooting-file-channel/data/log-20,
> > /var/local/flume/troubleshooting-file-channel/data/log-19]
> > 2013-06-17 21:53:20,172  INFO [lifecycleSupervisor-1-1]
> > o.a.f.c.f.EventQueueBackingStoreFileV3 Starting up with
> > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint and
> > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta
> > 2013-06-17 21:53:20,172  INFO [lifecycleSupervisor-1-1]
> > o.a.f.c.f.EventQueueBackingStoreFileV3 Reading checkpoint metadata from
> > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta
> > 2013-06-17 21:53:20,213  INFO [lifecycleSupervisor-1-1]
> > o.a.f.c.f.Log Last Checkpoint Mon Jun 17 21:04:26 UTC 2013, queue depth
> = 0
> > 2013-06-17 21:53:20,222  INFO [lifecycleSupervisor-1-1]
> > o.a.f.c.f.Log Replaying logs with v2 replay logic
> > 2013-06-17 21:53:20,225  INFO [lifecycleSupervisor-1-1]
> > o.a.f.c.f.ReplayHandler Starting replay of
> > [/var/local/flume/troubleshooting-file-channel/data/log-19,
> > /var/local/flume/troubleshooting-file-channel/data/log-20]
> > 2013-06-17 21:53:20,226  INFO [lifecycleSupervisor-1-1]
> > o.a.f.c.f.ReplayHandler Replaying
> > /var/local/flume/troubleshooting-file-channel/data/log-19
> > 2013-06-17 21:53:20,275  WARN [lifecycleSupervisor-1-1]
> > o.a.f.c.f.LogFile Checkpoint for
> > file(/var/local/flume/troubleshooting-file-channel/data/log-19) is:
> > 1371488755062, which is beyond the requested checkpoint time: 0 and
> > position 284327361
> > 2013-06-17 21:53:20,287  INFO [lifecycleSupervisor-1-1]
> > o.a.f.c.f.ReplayHandler Replaying
> > /var/local/flume/troubleshooting-file-channel/data/log-20
> > 2013-06-17 21:53:20,288  WARN [lifecycleSupervisor-1-1]
> > o.a.f.c.f.LogFile Checkpoint for
> > file(/var/local/flume/troubleshooting-file-channel/data/log-20) is:
> > 1371488770226, which is beyond the requested checkpoint time: 0 and
> > position 7078049
> > 2013-06-17 22:16:16,161  INFO [lifecycleSupervisor-1-1]
> > o.a.f.c.f.LogFile Encountered EOF at 284348767 in
> > /var/local/flume/troubleshooting-file-channel/data/log-19
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB