Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Question about gzip compression when using Flume Ng


Copy link to this message
-
Re: Question about gzip compression when using Flume Ng
Sagar Mehta 2013-01-14, 23:24
hadoop@jobtracker301:/home/hadoop/sagar/debug$ zcat
collector102.ngpipes.sac.ngmoco.com.1358204406896.gz | wc -l

gzip: collector102.ngpipes.sac.ngmoco.com.1358204406896.gz: decompression
OK, trailing garbage ignored
100

This should be about 50,000 events for the 5 min window!!

Sagar

On Mon, Jan 14, 2013 at 3:16 PM, Brock Noland <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Can you try:  zcat file > output
>
> I think what is occurring is because of the flush the output file is
> actually several concatenated gz files.
>
> Brock
>
> On Mon, Jan 14, 2013 at 3:12 PM, Sagar Mehta <[EMAIL PROTECTED]> wrote:
> > Yeah I have tried the text write format in vain before, but nevertheless
> > gave it a try again!! Below is the latest file - still the same thing.
> >
> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ date
> > Mon Jan 14 23:02:07 UTC 2013
> >
> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hls
> >
> /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz
> > Found 1 items
> > -rw-r--r--   3 hadoop supergroup    4798117 2013-01-14 22:55
> >
> /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz
> >
> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hget
> >
> /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz
> > .
> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ gunzip
> > collector102.ngpipes.sac.ngmoco.com.1358204141600.gz
> >
> > gzip: collector102.ngpipes.sac.ngmoco.com.1358204141600.gz: decompression
> > OK, trailing garbage ignored
> >
> > Interestingly enough, the gzip page says it is a harmless warning -
> > http://www.gzip.org/#faq8
> >
> > However, I'm losing events on decompression so I cannot afford to ignore
> > this warning. The gzip page gives an example about magnetic tape - there
> is
> > an analogy of hdfs block here since the file is initially stored in hdfs
> > before I pull it out on the local filesystem.
> >
> > Sagar
> >
> >
> >
> >
> > On Mon, Jan 14, 2013 at 2:52 PM, Connor Woodson <[EMAIL PROTECTED]>
> > wrote:
> >>
> >> collector102.sinks.sink1.hdfs.writeFormat = TEXT
> >> collector102.sinks.sink2.hdfs.writeFormat = TEXT
> >
> >
> >
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce -
> http://incubator.apache.org/mrunit/
>