Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> Review Request: FLUME-1632: Persist progress on each file in file spooling client/source


+
Mike Percy 2012-12-14, 09:56
+
Brock Noland 2012-12-14, 15:22
+
Mike Percy 2012-12-16, 13:31
+
Mike Percy 2012-12-16, 13:31
+
Mike Percy 2012-12-16, 13:33
+
Mike Percy 2012-12-17, 08:58
+
Brock Noland 2012-12-18, 17:44
+
Mike Percy 2012-12-18, 21:25
+
Brock Noland 2012-12-18, 21:29
+
Mike Percy 2012-12-18, 21:50
+
Mike Percy 2012-12-18, 23:22
+
Brock Noland 2012-12-18, 23:42
+
Brock Noland 2012-12-19, 00:00
+
Brock Noland 2012-12-19, 01:54
Copy link to this message
-
Re: Review Request: FLUME-1632: Persist progress on each file in file spooling client/source


> On Dec. 19, 2012, 1:54 a.m., Brock Noland wrote:
> > Hi Mike!
> >
> > Great patch!  There are a couple issues below but otherwise I think it's ready for a commit!

Thank you so much for your time reviewing this patch, Brock! I mostly tested with the source, now I see the CLI had some issues as well.
> On Dec. 19, 2012, 1:54 a.m., Brock Noland wrote:
> > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java, line 59
> > <https://reviews.apache.org/r/8596/diff/4/?file=240885#file240885line59>
> >
> >     It's quite slow with such a large batch size. 4 minutes to transfer 60MB of data to a local agent (memory channel and null sink). It would be nice if that was configurable. That could be a follow up JIRA but it'd be nice to set this be a command line option.

Yes, let's do this as a follow up JIRA.
> On Dec. 19, 2012, 1:54 a.m., Brock Noland wrote:
> > flume-ng-core/src/main/java/org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java, line 224
> > <https://reviews.apache.org/r/8596/diff/4/?file=240890#file240890line224>
> >
> >     It's possible for currentFile to be absent.

Oops, yep.
> On Dec. 19, 2012, 1:54 a.m., Brock Noland wrote:
> > flume-ng-core/src/main/java/org/apache/flume/serialization/LineDeserializer.java, line 107
> > <https://reviews.apache.org/r/8596/diff/4/?file=240901#file240901line107>
> >
> >     Seeing the error below. Guessing it's cause the last file during a run will be closed twice. One in retireCurrentFile and once in the close method. This was doubly ugly for me because it hid an exception being thrown in retireCurrentFile due to a file name violation.
> >    
> >    
> >     java.nio.channels.ClosedChannelException
> >     at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:88)
> >     at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:265)
> >     at org.apache.flume.serialization.ResettableFileInputStream.seek(ResettableFileInputStream.java:212)
> >     at org.apache.flume.serialization.ResettableFileInputStream.reset(ResettableFileInputStream.java:204)
> >     at org.apache.flume.serialization.LineDeserializer.reset(LineDeserializer.java:102)
> >     at org.apache.flume.serialization.LineDeserializer.close(LineDeserializer.java:107)
> >     at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.close(ReliableSpoolingFileEventReader.java:224)
> >     at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:217)
> >     at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:71)
> >    
> >

I improved the close() handling both in the ResettableFileInputStream and in the LineDeserializer to address this issue.
- Mike
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8596/#review14708
-----------------------------------------------------------
On Dec. 19, 2012, 12:37 p.m., Mike Percy wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8596/
> -----------------------------------------------------------
>
> (Updated Dec. 19, 2012, 12:37 p.m.)
>
>
> Review request for Flume.
>
>
> Description
> -------
>
> Defines EventDeserializer interface and uses it from the spooling source. Progress is persisted as bytes are read from the underlying file.
>
>
> This addresses bug FLUME-1632.
>     https://issues.apache.org/jira/browse/FLUME-1632
>
>
> Diffs
> -----
>
>   flume-ng-core/pom.xml 0224519
>   flume-ng-core/src/main/avro/TransferStateFileMeta.avsc PRE-CREATION
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 37e9ffa
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java 718e1b2
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/EventReader.java PRE-CREATION
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java 904f22c
+
Mike Percy 2012-12-19, 12:38
+
Brock Noland 2012-12-19, 15:35
+
Brock Noland 2012-12-19, 17:40
+
Mike Percy 2012-12-20, 03:10
+
Brock Noland 2012-12-19, 18:12