Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Question about gzip compression when using Flume Ng


Copy link to this message
-
Question about gzip compression when using Flume Ng
Hi Guys,

I'm using Flume Ng and it works great for me. In essence I'm using an exec
source for doing  tail -F on a logfile and using two HDFS sinks using a
File channel. So far so great - Now I'm trying to use gzip compression
using the following config as per the Flume-Ng User guide at
http://flume.apache.org/FlumeUserGuide.html.

#gzip compression related settings
collector102.sinks.sink1.hdfs.codeC = gzip
collector102.sinks.sink1.hdfs.fileType = CompressedStream
collector102.sinks.sink1.hdfs.fileSuffix = .gz

However this is what looks to be happening

*Flume seems to write gzipped compressed output [I see the .gz files in the
output buckets], however when I try to decompress it - I get an error about
'trailing garbage ignored' and the decompressed output is in fact smaller
in size.*

hadoop@jobtracker301:/home/hadoop/sagar/temp$ ls -ltr
collector102.ngpipes.sac.ngmoco.com.1357936638713.gz
-rw-r--r-- 1 hadoop hadoop *5381235* 2013-01-11 20:44
*collector102.ngpipes.sac.ngmoco.com.1357936638713.gz*

hadoop@jobtracker301:/home/hadoop/sagar/temp$ gunzip
collector102.ngpipes.sac.ngmoco.com.1357936638713.gz

*gzip: collector102.ngpipes.sac.ngmoco.com.1357936638713.gz: decompression
OK, trailing garbage ignored*
*
*
hadoop@jobtracker301:/home/hadoop/sagar/temp$ ls -l

-rw-r--r-- 1 hadoop hadoop *58898* 2013-01-11 20:44 *
collector102.ngpipes.sac.ngmoco.com.1357936638713*
*
*
*Below are some helpful details.*
*
*
*I'm using apache-flume-1.4.0-SNAPSHOT-bin*
*
*
smehta@collector102:/opt$ ls -l flume
lrwxrwxrwx 1 root root 31 2012-12-14 00:44 flume ->
apache-flume-1.4.0-SNAPSHOT-bin

*I also have the hadoop-core jar in my path*

smehta@collector102:/opt/flume/lib$ ls -l hadoop-core-0.20.2-cdh3u2.jar
-rw-r--r-- 1 hadoop hadoop 3534499 2012-12-01 01:53
hadoop-core-0.20.2-cdh3u2.jar
*
*
Everything is working well for me except the compression part. I'm not
quite sure what I'm missing here. So while I debug this, any ideas/help is
much appreciated.

Thanks in advance,
Sagar