Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Question about gzip compression when using Flume Ng


Copy link to this message
-
Re: Question about gzip compression when using Flume Ng
Try adding:

collector102.sinks.sink1.hdfs.writeFormat = TEXT
collector102.sinks.sink2.hdfs.writeFormat = TEXT

- Connor
On Mon, Jan 14, 2013 at 2:34 PM, Sagar Mehta <[EMAIL PROTECTED]> wrote:

> Yeah sure!!
>
> smehta@collector102:/opt/flume/conf$ cat hdfs.conf
> # hdfs.conf: This is a configuration file to configures Flume NG to use
> # An exec source to get a live tail of the jetty logFile
> # An hdfs sink to write events to the hdfs on the test cluster
> # A file based channel to connect the above source and sink
>
> # Name the components on this agent
> collector102.sources = source1
> collector102.sinks = sink1 sink2
> collector102.channels = channel1 channel2
>
> # Configure the source
> collector102.sources.source1.type = exec
> collector102.sources.source1.command = tail -F /opt/jetty/logFile.log
>
> # Configure the interceptors
> collector102.sources.source1.interceptors = TimestampInterceptor
> HostInterceptor
>
> # We use the Timestamp interceptor to get timestamps of when flume
> receives events
> # This is used for figuring out the bucket to which an event goes
> collector102.sources.source1.interceptors.TimestampInterceptor.type > timestamp
>
> # We use the Host interceptor to populate the host header with the fully
> qualified domain name of the collector.
> # That way we know which file in the sink respresents which collector.
> collector102.sources.source1.interceptors.HostInterceptor.type > org.apache.flume.interceptor.HostInterceptor$Builder
> collector102.sources.source1.interceptors.HostInterceptor.preserveExisting
> = false
> collector102.sources.source1.interceptors.HostInterceptor.useIP = false
> collector102.sources.source1.interceptors.HostInterceptor.hostHeader = host
>
> # Configure the sink
>
> collector102.sinks.sink1.type = hdfs
>
> # Configure the bucketing
> collector102.sinks.sink1.hdfs.path=hdfs://
> namenode301.ngpipes.milp.ngmoco.com:9000/ngpipes-raw-logs/%Y-%m-%d/%H00
>
> # Prefix the file with the source so that we know where the events in the
> file came from
> collector102.sinks.sink1.hdfs.filePrefix = %{host}
>
> # We roll the flume output file based on time interval - currently every 5
> minutes
> collector102.sinks.sink1.hdfs.rollSize = 0
> collector102.sinks.sink1.hdfs.rollCount = 0
> collector102.sinks.sink1.hdfs.rollInterval = 300
>
> #gzip compression related settings
> collector102.sinks.sink1.hdfs.codeC = gzip
> collector102.sinks.sink1.hdfs.fileType = CompressedStream
> collector102.sinks.sink1.hdfs.fileSuffix = .gz
>
> # Configure the sink
>
> collector102.sinks.sink2.type = hdfs
>
> # Configure the bucketing
> collector102.sinks.sink2.hdfs.path=hdfs://
> namenode5001.ngpipes.sac.ngmoco.com:9000/ngpipes-raw-logs/%Y-%m-%d/%H00
>
> # Prefix the file with the source so that we know where the events in the
> file came from
> collector102.sinks.sink2.hdfs.filePrefix = %{host}
>
> # We roll the flume output file based on time interval - currently every 5
> minutes
> collector102.sinks.sink2.hdfs.rollSize = 0
> collector102.sinks.sink2.hdfs.rollCount = 0
> collector102.sinks.sink2.hdfs.rollInterval = 300
> collector102.sinks.sink2.hdfs.fileType = DataStream
>
> # Configure the channel that connects the source to the sink
>
> # Use a channel which buffers events in filesystem
> collector102.channels.channel1.type = file
> collector102.channels.channel1.checkpointDir > /data/flume_data/channel1/checkpoint
> collector102.channels.channel1.dataDirs = /data/flume_data/channel1/data
>
> # Use a channel which buffers events in filesystem
> collector102.channels.channel2.type = file
> collector102.channels.channel2.checkpointDir > /data/flume_data/channel2/checkpoint
> collector102.channels.channel2.dataDirs = /data/flume_data/channel2/data
>
> # Bind the source and sink to the channel configured above
> collector102.sources.source1.channels = channel1 channel2
> collector102.sinks.sink1.channel = channel1
> collector102.sinks.sink2.channel = channel2
>
> On Mon, Jan 14, 2013 at 2:25 PM, Connor Woodson <[EMAIL PROTECTED]>wrote: