Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Question about gzip compression when using Flume Ng


Copy link to this message
-
Question about gzip compression when using Flume Ng
Hi Guys,

I'm using Flume Ng and it works great for me. In essence I'm using an exec
source for doing  tail -F on a logfile and using two HDFS sinks using a
File channel. So far so great - Now I'm trying to use gzip compression
using the following config as per the Flume-Ng User guide at
http://flume.apache.org/FlumeUserGuide.html.

#gzip compression related settings
collector102.sinks.sink1.hdfs.codeC = gzip
collector102.sinks.sink1.hdfs.fileType = CompressedStream
collector102.sinks.sink1.hdfs.fileSuffix = .gz

However this is what looks to be happening

*Flume seems to write gzipped compressed output [I see the .gz files in the
output buckets], however when I try to decompress it - I get an error about
'trailing garbage ignored' and the decompressed output is in fact smaller
in size.*

hadoop@jobtracker301:/home/hadoop/sagar/temp$ ls -ltr
collector102.ngpipes.sac.ngmoco.com.1357936638713.gz
-rw-r--r-- 1 hadoop hadoop *5381235* 2013-01-11 20:44
*collector102.ngpipes.sac.ngmoco.com.1357936638713.gz*

hadoop@jobtracker301:/home/hadoop/sagar/temp$ gunzip
collector102.ngpipes.sac.ngmoco.com.1357936638713.gz

*gzip: collector102.ngpipes.sac.ngmoco.com.1357936638713.gz: decompression
OK, trailing garbage ignored*
*
*
hadoop@jobtracker301:/home/hadoop/sagar/temp$ ls -l

-rw-r--r-- 1 hadoop hadoop *58898* 2013-01-11 20:44 *
collector102.ngpipes.sac.ngmoco.com.1357936638713*
*
*
*Below are some helpful details.*
*
*
*I'm using apache-flume-1.4.0-SNAPSHOT-bin*
*
*
smehta@collector102:/opt$ ls -l flume
lrwxrwxrwx 1 root root 31 2012-12-14 00:44 flume ->
apache-flume-1.4.0-SNAPSHOT-bin

*I also have the hadoop-core jar in my path*

smehta@collector102:/opt/flume/lib$ ls -l hadoop-core-0.20.2-cdh3u2.jar
-rw-r--r-- 1 hadoop hadoop 3534499 2012-12-01 01:53
hadoop-core-0.20.2-cdh3u2.jar
*
*
Everything is working well for me except the compression part. I'm not
quite sure what I'm missing here. So while I debug this, any ideas/help is
much appreciated.

Thanks in advance,
Sagar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB