Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Question about gzip compression when using Flume Ng


+
Sagar Mehta 2013-01-14, 19:18
+
Connor Woodson 2013-01-14, 22:25
+
Sagar Mehta 2013-01-14, 22:34
Copy link to this message
-
Re: Question about gzip compression when using Flume Ng
Try adding:

collector102.sinks.sink1.hdfs.writeFormat = TEXT
collector102.sinks.sink2.hdfs.writeFormat = TEXT

- Connor
On Mon, Jan 14, 2013 at 2:34 PM, Sagar Mehta <[EMAIL PROTECTED]> wrote:

> Yeah sure!!
>
> smehta@collector102:/opt/flume/conf$ cat hdfs.conf
> # hdfs.conf: This is a configuration file to configures Flume NG to use
> # An exec source to get a live tail of the jetty logFile
> # An hdfs sink to write events to the hdfs on the test cluster
> # A file based channel to connect the above source and sink
>
> # Name the components on this agent
> collector102.sources = source1
> collector102.sinks = sink1 sink2
> collector102.channels = channel1 channel2
>
> # Configure the source
> collector102.sources.source1.type = exec
> collector102.sources.source1.command = tail -F /opt/jetty/logFile.log
>
> # Configure the interceptors
> collector102.sources.source1.interceptors = TimestampInterceptor
> HostInterceptor
>
> # We use the Timestamp interceptor to get timestamps of when flume
> receives events
> # This is used for figuring out the bucket to which an event goes
> collector102.sources.source1.interceptors.TimestampInterceptor.type > timestamp
>
> # We use the Host interceptor to populate the host header with the fully
> qualified domain name of the collector.
> # That way we know which file in the sink respresents which collector.
> collector102.sources.source1.interceptors.HostInterceptor.type > org.apache.flume.interceptor.HostInterceptor$Builder
> collector102.sources.source1.interceptors.HostInterceptor.preserveExisting
> = false
> collector102.sources.source1.interceptors.HostInterceptor.useIP = false
> collector102.sources.source1.interceptors.HostInterceptor.hostHeader = host
>
> # Configure the sink
>
> collector102.sinks.sink1.type = hdfs
>
> # Configure the bucketing
> collector102.sinks.sink1.hdfs.path=hdfs://
> namenode301.ngpipes.milp.ngmoco.com:9000/ngpipes-raw-logs/%Y-%m-%d/%H00
>
> # Prefix the file with the source so that we know where the events in the
> file came from
> collector102.sinks.sink1.hdfs.filePrefix = %{host}
>
> # We roll the flume output file based on time interval - currently every 5
> minutes
> collector102.sinks.sink1.hdfs.rollSize = 0
> collector102.sinks.sink1.hdfs.rollCount = 0
> collector102.sinks.sink1.hdfs.rollInterval = 300
>
> #gzip compression related settings
> collector102.sinks.sink1.hdfs.codeC = gzip
> collector102.sinks.sink1.hdfs.fileType = CompressedStream
> collector102.sinks.sink1.hdfs.fileSuffix = .gz
>
> # Configure the sink
>
> collector102.sinks.sink2.type = hdfs
>
> # Configure the bucketing
> collector102.sinks.sink2.hdfs.path=hdfs://
> namenode5001.ngpipes.sac.ngmoco.com:9000/ngpipes-raw-logs/%Y-%m-%d/%H00
>
> # Prefix the file with the source so that we know where the events in the
> file came from
> collector102.sinks.sink2.hdfs.filePrefix = %{host}
>
> # We roll the flume output file based on time interval - currently every 5
> minutes
> collector102.sinks.sink2.hdfs.rollSize = 0
> collector102.sinks.sink2.hdfs.rollCount = 0
> collector102.sinks.sink2.hdfs.rollInterval = 300
> collector102.sinks.sink2.hdfs.fileType = DataStream
>
> # Configure the channel that connects the source to the sink
>
> # Use a channel which buffers events in filesystem
> collector102.channels.channel1.type = file
> collector102.channels.channel1.checkpointDir > /data/flume_data/channel1/checkpoint
> collector102.channels.channel1.dataDirs = /data/flume_data/channel1/data
>
> # Use a channel which buffers events in filesystem
> collector102.channels.channel2.type = file
> collector102.channels.channel2.checkpointDir > /data/flume_data/channel2/checkpoint
> collector102.channels.channel2.dataDirs = /data/flume_data/channel2/data
>
> # Bind the source and sink to the channel configured above
> collector102.sources.source1.channels = channel1 channel2
> collector102.sinks.sink1.channel = channel1
> collector102.sinks.sink2.channel = channel2
>
> On Mon, Jan 14, 2013 at 2:25 PM, Connor Woodson <[EMAIL PROTECTED]>wrote:
+
Sagar Mehta 2013-01-14, 23:12
+
Brock Noland 2013-01-14, 23:16
+
Sagar Mehta 2013-01-14, 23:24
+
Sagar Mehta 2013-01-14, 23:27
+
Brock Noland 2013-01-14, 23:38
+
Sagar Mehta 2013-01-15, 00:43
+
Brock Noland 2013-01-15, 00:54
+
Sagar Mehta 2013-01-15, 01:03
+
Connor Woodson 2013-01-15, 01:17
+
Sagar Mehta 2013-01-15, 01:52
+
Bhaskar V. Karambelkar 2013-01-15, 01:25
+
Connor Woodson 2013-01-15, 01:26
+
Sagar Mehta 2013-01-15, 02:36
+
Connor Woodson 2013-01-14, 23:17
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB