Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Question about gzip compression when using Flume Ng


Copy link to this message
-
Re: Question about gzip compression when using Flume Ng
ok so I dropped in the new hadoop-core jar in /opt/flume/lib [I got some
errors about the guava dependencies so put in that jar too]

smehta@collector102:/opt/flume/lib$ ls -ltrh | grep -e "hadoop-core" -e
"guava"
-rw-r--r-- 1 hadoop hadoop 1.5M 2012-11-14 21:49 guava-10.0.1.jar
-rw-r--r-- 1 hadoop hadoop 3.7M 2013-01-14 23:50
hadoop-core-0.20.2-cdh3u5.jar

Now I don't event see the file being created in hdfs and the flume log is
happily talking about housekeeping for some file channel checkpoints,
updating pointers et al

Below is tail of flume log

*hadoop@collector102:/data/flume_log$ tail -10 flume.log*
2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO
 org.apache.flume.channel.file.Log - Updated checkpoint for file:
/data/flume_data/channel2/data/log-36 position: 129415524 logWriteOrderID:
1358209947324
2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO
 org.apache.flume.channel.file.LogFile - Closing RandomReader
/data/flume_data/channel2/data/log-34
2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO
 org.apache.flume.channel.file.Log - Updated checkpoint for file:
/data/flume_data/channel1/data/log-36 position: 129415524 logWriteOrderID:
1358209947323
2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO
 org.apache.flume.channel.file.LogFile - Closing RandomReader
/data/flume_data/channel1/data/log-34
2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel2] INFO
 org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta
currentPosition = 18577138, logWriteOrderID = 1358209947324
2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel1] INFO
 org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta
currentPosition = 18577138, logWriteOrderID = 1358209947323
2013-01-15 00:42:10,820 [Log-BackgroundWorker-channel1] INFO
 org.apache.flume.channel.file.LogFile - Closing RandomReader
/data/flume_data/channel1/data/log-35
2013-01-15 00:42:10,821 [Log-BackgroundWorker-channel2] INFO
 org.apache.flume.channel.file.LogFile - Closing RandomReader
/data/flume_data/channel2/data/log-35
2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel1] INFO
 org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta
currentPosition = 217919486, logWriteOrderID = 1358209947323
2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel2] INFO
 org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta
currentPosition = 217919486, logWriteOrderID = 1358209947324

Sagar
On Mon, Jan 14, 2013 at 3:38 PM, Brock Noland <[EMAIL PROTECTED]> wrote:

> Hmm, could you try and updated version of Hadoop? CDH3u2 is quite old,
> I would upgrade to CDH3u5 or CDH 4.1.2.
>
> On Mon, Jan 14, 2013 at 3:27 PM, Sagar Mehta <[EMAIL PROTECTED]> wrote:
> > About the bz2 suggestion, we have a ton of downstream jobs that assume
> gzip
> > compressed files - so it is better to stick to gzip.
> >
> > The plan B for us is to have a Oozie step to gzip compress the logs
> before
> > proceeding with downstream Hadoop jobs - but that looks like a hack to
> me!!
> >
> > Sagar
> >
> >
> > On Mon, Jan 14, 2013 at 3:24 PM, Sagar Mehta <[EMAIL PROTECTED]>
> wrote:
> >>
> >> hadoop@jobtracker301:/home/hadoop/sagar/debug$ zcat
> >> collector102.ngpipes.sac.ngmoco.com.1358204406896.gz | wc -l
> >>
> >> gzip: collector102.ngpipes.sac.ngmoco.com.1358204406896.gz:
> decompression
> >> OK, trailing garbage ignored
> >> 100
> >>
> >> This should be about 50,000 events for the 5 min window!!
> >>
> >> Sagar
> >>
> >> On Mon, Jan 14, 2013 at 3:16 PM, Brock Noland <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Can you try:  zcat file > output
> >>>
> >>> I think what is occurring is because of the flush the output file is
> >>> actually several concatenated gz files.
> >>>
> >>> Brock
> >>>
> >>> On Mon, Jan 14, 2013 at 3:12 PM, Sagar Mehta <[EMAIL PROTECTED]>
> >>> wrote:
> >>> > Yeah I have tried the text write format in vain before, but
> >>> > nevertheless
> >>> > gave it a try again!! Below is the latest file - still the same
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB