Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # dev - Review Request 16107: FLUME-2155: Improve replay time


Copy link to this message
-
Re: Review Request 16107: FLUME-2155: Improve replay time
Hari Shreedharan 2013-12-10, 21:27


> On Dec. 10, 2013, 9:10 p.m., Hari Shreedharan wrote:
> > Brock:
> >
> > Looks like this patch is causing test failures:
> >
> > Failed tests:   testRestartWhenMetaDataExistsButCheckpointDoesNotWithBackup(org.apache.flume.channel.file.TestFileChannelRestart)
> >   testRestartWhenCheckpointExistsButMetaDoesNotWithBackup(org.apache.flume.channel.file.TestFileChannelRestart)
> >   testRestartWhenNoCheckpointExistsWithBackup(org.apache.flume.channel.file.TestFileChannelRestart)
> >   testBadCheckpointVersionWithBackup(org.apache.flume.channel.file.TestFileChannelRestart)
> >   testBadCheckpointMetaVersionWithBackup(org.apache.flume.channel.file.TestFileChannelRestart)
> >   testDifferingOrderIDCheckpointAndMetaVersionWithBackup(org.apache.flume.channel.file.TestFileChannelRestart)
> >   testIncompleteCheckpointWithCheckpoint(org.apache.flume.channel.file.TestFileChannelRestart)
> >   testCorruptInflightPutsWithBackup(org.apache.flume.channel.file.TestFileChannelRestart)
> >   testCorruptInflightTakesWithBackup(org.apache.flume.channel.file.TestFileChannelRestart)
> >   testTruncatedCheckpointMetaWithBackup(org.apache.flume.channel.file.TestFileChannelRestart)
> >   testCorruptCheckpointMetaWithBackup(org.apache.flume.channel.file.TestFileChannelRestart)
> >   testBackupUsedEnsureNoFullReplay(org.apache.flume.channel.file.TestFileChannelRestart)
> >
> >

Looks like the reason for this is an error coming from the backup:
2013-12-10 13:06:10,425 (main) [INFO - org.apache.flume.channel.file.EventQueueBackingStoreFile.startBackupThread(EventQueueBackingStoreFile.java:275)] Attempting to back up checkpoint.
2013-12-10 13:06:10,426 ([channel=FileChannel-78cfe3ea-5dd1-4fe7-80a3-bd358fca3a70] - CheckpointBackUpThread) [INFO - org.apache.flume.channel.file.Serialization.deleteAllFiles(Serialization.java:105)] Skipping in_use.lock because it is in excludes set
2013-12-10 13:06:10,426 ([channel=FileChannel-78cfe3ea-5dd1-4fe7-80a3-bd358fca3a70] - CheckpointBackUpThread) [INFO - org.apache.flume.channel.file.Serialization.deleteAllFiles(Serialization.java:118)] Deleted the following files: , checkpoint, checkpoint.meta, inflightputs, inflighttakes.
2013-12-10 13:06:10,429 (main) [INFO - org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1020)] Updated checkpoint for file: /var/folders/yn/g7q3wr0n6891lckwvn01s9080000gn/T/1386709570074-0/data1/log-1 position: 3854 logWriteOrderID: 1386709677565
2013-12-10 13:06:10,431 ([channel=FileChannel-78cfe3ea-5dd1-4fe7-80a3-bd358fca3a70] - CheckpointBackUpThread) [ERROR - org.apache.flume.channel.file.Serialization.copyFile(Serialization.java:158)] Error while attempting to copy /var/folders/yn/g7q3wr0n6891lckwvn01s9080000gn/T/1386709570074-0/chkpt/queueset to /var/folders/yn/g7q3wr0n6891lckwvn01s9080000gn/T/1386709570074-0/backup/queueset.
java.io.FileNotFoundException: /var/folders/yn/g7q3wr0n6891lckwvn01s9080000gn/T/1386709570074-0/chkpt/queueset (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:120)
        at org.apache.flume.channel.file.Serialization.copyFile(Serialization.java:141)
        at org.apache.flume.channel.file.EventQueueBackingStoreFile.backupCheckpoint(EventQueueBackingStoreFile.java:172)
        at org.apache.flume.channel.file.EventQueueBackingStoreFile$1.run(EventQueueBackingStoreFile.java:282)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:695)

This seems to be because the backup happens in a different thread, which lists the files and calls the delete method on each one - while another thread deletes the queueset file. We should add queueset to the EXCLUDES list.
- Hari
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16107/#review30128
On Dec. 10, 2013, 2:58 p.m., Brock Noland wrote: