Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Review Request: Patch to support custom FlumeFormatter implementations for writing HDFS SequenceFiles


Copy link to this message
-
Re: Review Request: Patch to support custom FlumeFormatter implementations for writing HDFS SequenceFiles

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6918/#review14699
-----------------------------------------------------------

Ship it!
+1 for rebased patch

- Mike Percy
On Sept. 14, 2012, 5:27 a.m., Chris Birchall wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6918/
> -----------------------------------------------------------
>
> (Updated Sept. 14, 2012, 5:27 a.m.)
>
>
> Review request for Flume.
>
>
> Description
> -------
>
> This patch allows users to customise the format of HDFS SequenceFiles by providing a custom FlumeFormatter implementation.
>
> Currently, the user can set hdfs.writeFormat to either "Text" or "Writable", corresponding to HDFSTextFormatter and HDFSWritableFormatter respectively.
>
> With this patch, hdfs.writeFormat can also be set to the full class name of a Builder implementation, e.g.:
>     agent_foo.sinks.hdfs-sink.writeFormat=com.mycompany.flume.MyCustomFormatter$Builder
>
> They can also pass custom configuration params to the builder, e.g.:
>     agent_foo.sinks.hdfs-sink.writeFormat.ignoreHeaders=foo,bar
>
> These params will be passed to the Builder's build() method as a Context object.
>
> I've tried to be as consistent as possible with the design of EventSerializerFactory:
> * Use an enum for the different formatter types, rather than static strings.
> * Use a Builder, rather than constructing a FlumeFormatter directly.
>
>
> This addresses bug FLUME-1100.
>     https://issues.apache.org/jira/browse/FLUME-1100
>
>
> Diffs
> -----
>
>   flume-ng-core/src/main/java/org/apache/flume/sink/FlumeFormatter.java 05d60b7
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 6408eb9
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSCompressedDataStream.java 80341ef
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSDataStream.java 2ac06d7
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 9a76ecb
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSFormatterFactory.java 7b47942
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java ac9104d
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSTextFormatter.java 5839dbb
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWritableFormatter.java 9f03389
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWriter.java 6408e9b
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SeqFileFormatter.java PRE-CREATION
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SeqFileFormatterFactory.java PRE-CREATION
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SeqFileFormatterType.java PRE-CREATION
>   flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadDataStream.java 423f7c1
>   flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java a911ab7
>   flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockHDFSWriter.java 24a7cbf
>   flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MyCustomFormatter.java PRE-CREATION
>   flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java bb12188
>   flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSCompressedDataStream.java f537732
>   flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestSeqFileFormatterFactory.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/6918/diff/
>
>
> Testing
> -------
>
> Unit tests included in patch.