Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> Slow behaviour on replay from FileChannel


Copy link to this message
-
Slow behaviour on replay from FileChannel
Hi all,
This may be a user question so feel free to punt me to that list. However,
I've just seen behaviour which seems mighty slow and I don't understand why.

I restarted one of our Flume agents and it took about 23 minutes before it
was ready to accept new events. The logs seem to indicate that it took the
majority of that time to workthrough the data file that only had 6885
events in it. This seems mighty slow to me.

Does anybody have an explanation for this? Is there something I should do
in the future to bring it back up faster? I looked at the code and there's
nothing obviously slow about it.

Many thanks,
Edward

Log snippet (filtered to be only this thread and large number of Pending
take messages removed):
2013-06-17 21:53:20,154  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.FileChannel Starting FileChannel troubleshootingFileChannel {
dataDirs: [/var/local/flume/troubleshooting-file-channel/data] }...
2013-06-17 21:53:20,155  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Encryption is not enabled
2013-06-17 21:53:20,155  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Replay started
2013-06-17 21:53:20,165  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Found NextFileID 20, from
[/var/local/flume/troubleshooting-file-channel/data/log-20,
/var/local/flume/troubleshooting-file-channel/data/log-19]
2013-06-17 21:53:20,172  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.EventQueueBackingStoreFileV3 Starting up with
/var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint and
/var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta
2013-06-17 21:53:20,172  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.EventQueueBackingStoreFileV3 Reading checkpoint metadata from
/var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta
2013-06-17 21:53:20,213  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Last Checkpoint Mon Jun 17 21:04:26 UTC 2013, queue depth = 0
2013-06-17 21:53:20,222  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Replaying logs with v2 replay logic
2013-06-17 21:53:20,225  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.ReplayHandler Starting replay of
[/var/local/flume/troubleshooting-file-channel/data/log-19,
/var/local/flume/troubleshooting-file-channel/data/log-20]
2013-06-17 21:53:20,226  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.ReplayHandler Replaying
/var/local/flume/troubleshooting-file-channel/data/log-19
2013-06-17 21:53:20,275  WARN [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFile Checkpoint for
file(/var/local/flume/troubleshooting-file-channel/data/log-19) is:
1371488755062, which is beyond the requested checkpoint time: 0 and
position 284327361
2013-06-17 21:53:20,287  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.ReplayHandler Replaying
/var/local/flume/troubleshooting-file-channel/data/log-20
2013-06-17 21:53:20,288  WARN [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFile Checkpoint for
file(/var/local/flume/troubleshooting-file-channel/data/log-20) is:
1371488770226, which is beyond the requested checkpoint time: 0 and
position 7078049
2013-06-17 22:16:16,161  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFile Encountered EOF at 284348767 in
/var/local/flume/troubleshooting-file-channel/data/log-19
2013-06-17 22:16:37,802  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFile Encountered EOF at 7266532 in
/var/local/flume/troubleshooting-file-channel/data/log-20
2013-06-17 22:16:37,805  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.ReplayHandler read: 3133788, put: 434464, take: 2618590,
rollback: 15872, commit: 64862, skip: 0, eventCount:1535585
2013-06-17 22:16:37,805 DEBUG [lifecycleSupervisor-1-1]
o.a.f.c.f.ReplayHandler Pending take FlumeEventPointer [fileID=15,
offset=410595]

...6883 similar messages...

2013-06-17 22:16:48,465 DEBUG [lifecycleSupervisor-1-1]
o.a.f.c.f.ReplayHandler Pending take FlumeEventPointer [fileID=18,
offset=2579603]
2013-06-17 22:16:48,465  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Rolling /var/local/flume/troubleshooting-file-channel/data
2013-06-17 22:16:48,466  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Roll start /var/local/flume/troubleshooting-file-channel/data
2013-06-17 22:16:48,466  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFile Opened
/var/local/flume/troubleshooting-file-channel/data/log-21
2013-06-17 22:16:48,644  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Roll end
2013-06-17 22:16:48,645  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.EventQueueBackingStoreFile Start checkpoint for
/var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint,
elements to sync = 383705
2013-06-17 22:16:48,756  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.EventQueueBackingStoreFile Updating checkpoint metadata:
logWriteOrderID: 1371506000194, queueSize: 4, queueHead: 383699
2013-06-17 22:16:49,501  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFileV3 Updating log-21.meta currentPosition = 0,
logWriteOrderID = 1371506000194
2013-06-17 22:16:54,333  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Updated checkpoint for file:
/var/local/flume/troubleshooting-file-channel/data/log-21 position: 0
logWriteOrderID: 1371506000194
2013-06-17 22:16:54,333  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFile Closing RandomReader
/var/local/flume/troubleshooting-file-channel/data/log-20
2013-06-17 22:16:54,339  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFileV3 Updating log-20.meta currentPosition = 7078049,
logWriteOrderID = 1371506000194
2013-06-17 22:16:54,887 DEBUG [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Updated checkpoint for file:
/var/local/flume/troubleshooting-file-channel/data/log-20logWriteOrderID
1371506000194
2013-06-17 22:16:54,887  INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.FileChannel Queue Size after replay: 4
[channel=troubleshootingFileChannel]
2013-06-17 22:16:54,887  INFO [lifecycleSupervisor-1-1]
o.a.f.i.MonitoredCounterGroup Monitoried counter group for type: CHANNEL,
name: troubleshootingFileChannel, registered successfully.
2013-06-17 22:16:54,887  INFO [lifecycleSupervisor-1-1]
o.a.f.i.MonitoredCounterGroup Component type: CHANNEL, name:
t
+
Mike Percy 2013-06-23, 10:02
+
Hari Shreedharan 2013-06-23, 10:05
+
Mike Percy 2013-06-23, 10:10
+
Edward Sargisson 2013-06-23, 13:27