Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> Slow behaviour on replay from FileChannel


+
Edward Sargisson 2013-06-17, 22:53
+
Mike Percy 2013-06-23, 10:02
+
Hari Shreedharan 2013-06-23, 10:05
Copy link to this message
-
Re: Slow behaviour on replay from FileChannel
Good point Hari. I think a log file snippet would help to clarify.

Mike
On Sun, Jun 23, 2013 at 3:05 AM, Hari Shreedharan <[EMAIL PROTECTED]
> wrote:

> This was likely due to the checkpoint being corrupt and automatically being
> cleaned up causing a replay of all your files. Can you try enabling dual
> checkpoints (you will need to use trunk or the upcoming 1.4 release for
> this feature though).
>
> Hari
>
> On Sunday, June 23, 2013, Mike Percy wrote:
>
> > Edward,
> > Someone told me they saw similar behavior but that it seemed
> intermittent /
> > not consistent. I haven't seen this, typically the FC is very fast with
> > replay. Any update on this?
> >
> > Thanks,
> > Mike
> >
> >
> >
> > On Mon, Jun 17, 2013 at 3:53 PM, Edward Sargisson <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi all,
> > > This may be a user question so feel free to punt me to that list.
> > However,
> > > I've just seen behaviour which seems mighty slow and I don't understand
> > > why.
> > >
> > > I restarted one of our Flume agents and it took about 23 minutes before
> > it
> > > was ready to accept new events. The logs seem to indicate that it took
> > the
> > > majority of that time to workthrough the data file that only had 6885
> > > events in it. This seems mighty slow to me.
> > >
> > > Does anybody have an explanation for this? Is there something I should
> do
> > > in the future to bring it back up faster? I looked at the code and
> > there's
> > > nothing obviously slow about it.
> > >
> > > Many thanks,
> > > Edward
> > >
> > > Log snippet (filtered to be only this thread and large number of
> Pending
> > > take messages removed):
> > > 2013-06-17 21:53:20,154  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.FileChannel Starting FileChannel troubleshootingFileChannel {
> > > dataDirs: [/var/local/flume/troubleshooting-file-channel/data] }...
> > > 2013-06-17 21:53:20,155  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.Log Encryption is not enabled
> > > 2013-06-17 21:53:20,155  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.Log Replay started
> > > 2013-06-17 21:53:20,165  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.Log Found NextFileID 20, from
> > > [/var/local/flume/troubleshooting-file-channel/data/log-20,
> > > /var/local/flume/troubleshooting-file-channel/data/log-19]
> > > 2013-06-17 21:53:20,172  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.EventQueueBackingStoreFileV3 Starting up with
> > > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint and
> > >
> /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta
> > > 2013-06-17 21:53:20,172  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.EventQueueBackingStoreFileV3 Reading checkpoint metadata from
> > >
> /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta
> > > 2013-06-17 21:53:20,213  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.Log Last Checkpoint Mon Jun 17 21:04:26 UTC 2013, queue depth
> > = 0
> > > 2013-06-17 21:53:20,222  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.Log Replaying logs with v2 replay logic
> > > 2013-06-17 21:53:20,225  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.ReplayHandler Starting replay of
> > > [/var/local/flume/troubleshooting-file-channel/data/log-19,
> > > /var/local/flume/troubleshooting-file-channel/data/log-20]
> > > 2013-06-17 21:53:20,226  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.ReplayHandler Replaying
> > > /var/local/flume/troubleshooting-file-channel/data/log-19
> > > 2013-06-17 21:53:20,275  WARN [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.LogFile Checkpoint for
> > > file(/var/local/flume/troubleshooting-file-channel/data/log-19) is:
> > > 1371488755062, which is beyond the requested checkpoint time: 0 and
> > > position 284327361
> > > 2013-06-17 21:53:20,287  INFO [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.ReplayHandler Replaying
> > > /var/local/flume/troubleshooting-file-channel/data/log-20
> > > 2013-06-17 21:53:20,288  WARN [lifecycleSupervisor-1-1]
> > > o.a.f.c.f.LogFile Checkpoint for
+
Edward Sargisson 2013-06-23, 13:27
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB