Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Question about gzip compression when using Flume Ng


Copy link to this message
-
Re: Question about gzip compression when using Flume Ng
Yeah sure!!

smehta@collector102:/opt/flume/conf$ cat hdfs.conf
# hdfs.conf: This is a configuration file to configures Flume NG to use
# An exec source to get a live tail of the jetty logFile
# An hdfs sink to write events to the hdfs on the test cluster
# A file based channel to connect the above source and sink

# Name the components on this agent
collector102.sources = source1
collector102.sinks = sink1 sink2
collector102.channels = channel1 channel2

# Configure the source
collector102.sources.source1.type = exec
collector102.sources.source1.command = tail -F /opt/jetty/logFile.log

# Configure the interceptors
collector102.sources.source1.interceptors = TimestampInterceptor
HostInterceptor

# We use the Timestamp interceptor to get timestamps of when flume receives
events
# This is used for figuring out the bucket to which an event goes
collector102.sources.source1.interceptors.TimestampInterceptor.type timestamp

# We use the Host interceptor to populate the host header with the fully
qualified domain name of the collector.
# That way we know which file in the sink respresents which collector.
collector102.sources.source1.interceptors.HostInterceptor.type org.apache.flume.interceptor.HostInterceptor$Builder
collector102.sources.source1.interceptors.HostInterceptor.preserveExisting
= false
collector102.sources.source1.interceptors.HostInterceptor.useIP = false
collector102.sources.source1.interceptors.HostInterceptor.hostHeader = host

# Configure the sink

collector102.sinks.sink1.type = hdfs

# Configure the bucketing
collector102.sinks.sink1.hdfs.path=hdfs://
namenode301.ngpipes.milp.ngmoco.com:9000/ngpipes-raw-logs/%Y-%m-%d/%H00

# Prefix the file with the source so that we know where the events in the
file came from
collector102.sinks.sink1.hdfs.filePrefix = %{host}

# We roll the flume output file based on time interval - currently every 5
minutes
collector102.sinks.sink1.hdfs.rollSize = 0
collector102.sinks.sink1.hdfs.rollCount = 0
collector102.sinks.sink1.hdfs.rollInterval = 300

#gzip compression related settings
collector102.sinks.sink1.hdfs.codeC = gzip
collector102.sinks.sink1.hdfs.fileType = CompressedStream
collector102.sinks.sink1.hdfs.fileSuffix = .gz

# Configure the sink

collector102.sinks.sink2.type = hdfs

# Configure the bucketing
collector102.sinks.sink2.hdfs.path=hdfs://
namenode5001.ngpipes.sac.ngmoco.com:9000/ngpipes-raw-logs/%Y-%m-%d/%H00

# Prefix the file with the source so that we know where the events in the
file came from
collector102.sinks.sink2.hdfs.filePrefix = %{host}

# We roll the flume output file based on time interval - currently every 5
minutes
collector102.sinks.sink2.hdfs.rollSize = 0
collector102.sinks.sink2.hdfs.rollCount = 0
collector102.sinks.sink2.hdfs.rollInterval = 300
collector102.sinks.sink2.hdfs.fileType = DataStream

# Configure the channel that connects the source to the sink

# Use a channel which buffers events in filesystem
collector102.channels.channel1.type = file
collector102.channels.channel1.checkpointDir /data/flume_data/channel1/checkpoint
collector102.channels.channel1.dataDirs = /data/flume_data/channel1/data

# Use a channel which buffers events in filesystem
collector102.channels.channel2.type = file
collector102.channels.channel2.checkpointDir /data/flume_data/channel2/checkpoint
collector102.channels.channel2.dataDirs = /data/flume_data/channel2/data

# Bind the source and sink to the channel configured above
collector102.sources.source1.channels = channel1 channel2
collector102.sinks.sink1.channel = channel1
collector102.sinks.sink2.channel = channel2

On Mon, Jan 14, 2013 at 2:25 PM, Connor Woodson <[EMAIL PROTECTED]>wrote:

> Can you post your full config?
>
> - Connor
>
>
> On Mon, Jan 14, 2013 at 11:18 AM, Sagar Mehta <[EMAIL PROTECTED]>wrote:
>
>> Hi Guys,
>>
>> I'm using Flume Ng and it works great for me. In essence I'm using an
>> exec source for doing  tail -F on a logfile and using two HDFS sinks using
>> a File channel. So far so great - Now I'm trying to use gzip compression