Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Question about gzip compression when using Flume Ng


Copy link to this message
-
Re: Question about gzip compression when using Flume Ng
About the bz2 suggestion, we have a ton of downstream jobs that assume gzip
compressed files - so it is better to stick to gzip.

The plan B for us is to have a Oozie step to gzip compress the logs before
proceeding with downstream Hadoop jobs - but that looks like a hack to me!!

Sagar

On Mon, Jan 14, 2013 at 3:24 PM, Sagar Mehta <[EMAIL PROTECTED]> wrote:

> hadoop@jobtracker301:/home/hadoop/sagar/debug$ zcat
> collector102.ngpipes.sac.ngmoco.com.1358204406896.gz | wc -l
>
> gzip: collector102.ngpipes.sac.ngmoco.com.1358204406896.gz: decompression
> OK, trailing garbage ignored
> 100
>
> This should be about 50,000 events for the 5 min window!!
>
> Sagar
>
> On Mon, Jan 14, 2013 at 3:16 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> Can you try:  zcat file > output
>>
>> I think what is occurring is because of the flush the output file is
>> actually several concatenated gz files.
>>
>> Brock
>>
>> On Mon, Jan 14, 2013 at 3:12 PM, Sagar Mehta <[EMAIL PROTECTED]>
>> wrote:
>> > Yeah I have tried the text write format in vain before, but nevertheless
>> > gave it a try again!! Below is the latest file - still the same thing.
>> >
>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ date
>> > Mon Jan 14 23:02:07 UTC 2013
>> >
>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hls
>> >
>> /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz
>> > Found 1 items
>> > -rw-r--r--   3 hadoop supergroup    4798117 2013-01-14 22:55
>> >
>> /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz
>> >
>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hget
>> >
>> /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz
>> > .
>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ gunzip
>> > collector102.ngpipes.sac.ngmoco.com.1358204141600.gz
>> >
>> > gzip: collector102.ngpipes.sac.ngmoco.com.1358204141600.gz:
>> decompression
>> > OK, trailing garbage ignored
>> >
>> > Interestingly enough, the gzip page says it is a harmless warning -
>> > http://www.gzip.org/#faq8
>> >
>> > However, I'm losing events on decompression so I cannot afford to ignore
>> > this warning. The gzip page gives an example about magnetic tape -
>> there is
>> > an analogy of hdfs block here since the file is initially stored in hdfs
>> > before I pull it out on the local filesystem.
>> >
>> > Sagar
>> >
>> >
>> >
>> >
>> > On Mon, Jan 14, 2013 at 2:52 PM, Connor Woodson <[EMAIL PROTECTED]
>> >
>> > wrote:
>> >>
>> >> collector102.sinks.sink1.hdfs.writeFormat = TEXT
>> >> collector102.sinks.sink2.hdfs.writeFormat = TEXT
>> >
>> >
>> >
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce -
>> http://incubator.apache.org/mrunit/
>>
>
>