Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Question about gzip compression when using Flume Ng


Copy link to this message
-
Re: Question about gzip compression when using Flume Ng
Yeah sure!!

smehta@collector102:/opt/flume/conf$ cat hdfs.conf
# hdfs.conf: This is a configuration file to configures Flume NG to use
# An exec source to get a live tail of the jetty logFile
# An hdfs sink to write events to the hdfs on the test cluster
# A file based channel to connect the above source and sink

# Name the components on this agent
collector102.sources = source1
collector102.sinks = sink1 sink2
collector102.channels = channel1 channel2

# Configure the source
collector102.sources.source1.type = exec
collector102.sources.source1.command = tail -F /opt/jetty/logFile.log

# Configure the interceptors
collector102.sources.source1.interceptors = TimestampInterceptor
HostInterceptor

# We use the Timestamp interceptor to get timestamps of when flume receives
events
# This is used for figuring out the bucket to which an event goes
collector102.sources.source1.interceptors.TimestampInterceptor.type timestamp

# We use the Host interceptor to populate the host header with the fully
qualified domain name of the collector.
# That way we know which file in the sink respresents which collector.
collector102.sources.source1.interceptors.HostInterceptor.type org.apache.flume.interceptor.HostInterceptor$Builder
collector102.sources.source1.interceptors.HostInterceptor.preserveExisting
= false
collector102.sources.source1.interceptors.HostInterceptor.useIP = false
collector102.sources.source1.interceptors.HostInterceptor.hostHeader = host

# Configure the sink

collector102.sinks.sink1.type = hdfs

# Configure the bucketing
collector102.sinks.sink1.hdfs.path=hdfs://
namenode301.ngpipes.milp.ngmoco.com:9000/ngpipes-raw-logs/%Y-%m-%d/%H00

# Prefix the file with the source so that we know where the events in the
file came from
collector102.sinks.sink1.hdfs.filePrefix = %{host}

# We roll the flume output file based on time interval - currently every 5
minutes
collector102.sinks.sink1.hdfs.rollSize = 0
collector102.sinks.sink1.hdfs.rollCount = 0
collector102.sinks.sink1.hdfs.rollInterval = 300

#gzip compression related settings
collector102.sinks.sink1.hdfs.codeC = gzip
collector102.sinks.sink1.hdfs.fileType = CompressedStream
collector102.sinks.sink1.hdfs.fileSuffix = .gz

# Configure the sink

collector102.sinks.sink2.type = hdfs

# Configure the bucketing
collector102.sinks.sink2.hdfs.path=hdfs://
namenode5001.ngpipes.sac.ngmoco.com:9000/ngpipes-raw-logs/%Y-%m-%d/%H00

# Prefix the file with the source so that we know where the events in the
file came from
collector102.sinks.sink2.hdfs.filePrefix = %{host}

# We roll the flume output file based on time interval - currently every 5
minutes
collector102.sinks.sink2.hdfs.rollSize = 0
collector102.sinks.sink2.hdfs.rollCount = 0
collector102.sinks.sink2.hdfs.rollInterval = 300
collector102.sinks.sink2.hdfs.fileType = DataStream

# Configure the channel that connects the source to the sink

# Use a channel which buffers events in filesystem
collector102.channels.channel1.type = file
collector102.channels.channel1.checkpointDir /data/flume_data/channel1/checkpoint
collector102.channels.channel1.dataDirs = /data/flume_data/channel1/data

# Use a channel which buffers events in filesystem
collector102.channels.channel2.type = file
collector102.channels.channel2.checkpointDir /data/flume_data/channel2/checkpoint
collector102.channels.channel2.dataDirs = /data/flume_data/channel2/data

# Bind the source and sink to the channel configured above
collector102.sources.source1.channels = channel1 channel2
collector102.sinks.sink1.channel = channel1
collector102.sinks.sink2.channel = channel2

On Mon, Jan 14, 2013 at 2:25 PM, Connor Woodson <[EMAIL PROTECTED]>wrote:

> Can you post your full config?
>
> - Connor
>
>
> On Mon, Jan 14, 2013 at 11:18 AM, Sagar Mehta <[EMAIL PROTECTED]>wrote:
>
>> Hi Guys,
>>
>> I'm using Flume Ng and it works great for me. In essence I'm using an
>> exec source for doing  tail -F on a logfile and using two HDFS sinks using
>> a File channel. So far so great - Now I'm trying to use gzip compression
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB