Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Question about gzip compression when using Flume Ng


Copy link to this message
-
Re: Question about gzip compression when using Flume Ng
Brock Noland 2013-01-15, 00:54
Hi,

That's just the file channel. The HDFSEventSink will need a heck of a
lot more than the just those two jars. To override the version of
hadoop it will find from the hadoop command you probably want to set
HADOOP_HOME in flume-env.sh to your custom install.

Also, the client and server should be the same version.

Brock

On Mon, Jan 14, 2013 at 4:43 PM, Sagar Mehta <[EMAIL PROTECTED]> wrote:
> ok so I dropped in the new hadoop-core jar in /opt/flume/lib [I got some
> errors about the guava dependencies so put in that jar too]
>
> smehta@collector102:/opt/flume/lib$ ls -ltrh | grep -e "hadoop-core" -e
> "guava"
> -rw-r--r-- 1 hadoop hadoop 1.5M 2012-11-14 21:49 guava-10.0.1.jar
> -rw-r--r-- 1 hadoop hadoop 3.7M 2013-01-14 23:50
> hadoop-core-0.20.2-cdh3u5.jar
>
> Now I don't event see the file being created in hdfs and the flume log is
> happily talking about housekeeping for some file channel checkpoints,
> updating pointers et al
>
> Below is tail of flume log
>
> hadoop@collector102:/data/flume_log$ tail -10 flume.log
> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO
> org.apache.flume.channel.file.Log - Updated checkpoint for file:
> /data/flume_data/channel2/data/log-36 position: 129415524 logWriteOrderID:
> 1358209947324
> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO
> org.apache.flume.channel.file.LogFile - Closing RandomReader
> /data/flume_data/channel2/data/log-34
> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO
> org.apache.flume.channel.file.Log - Updated checkpoint for file:
> /data/flume_data/channel1/data/log-36 position: 129415524 logWriteOrderID:
> 1358209947323
> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO
> org.apache.flume.channel.file.LogFile - Closing RandomReader
> /data/flume_data/channel1/data/log-34
> 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel2] INFO
> org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta
> currentPosition = 18577138, logWriteOrderID = 1358209947324
> 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel1] INFO
> org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta
> currentPosition = 18577138, logWriteOrderID = 1358209947323
> 2013-01-15 00:42:10,820 [Log-BackgroundWorker-channel1] INFO
> org.apache.flume.channel.file.LogFile - Closing RandomReader
> /data/flume_data/channel1/data/log-35
> 2013-01-15 00:42:10,821 [Log-BackgroundWorker-channel2] INFO
> org.apache.flume.channel.file.LogFile - Closing RandomReader
> /data/flume_data/channel2/data/log-35
> 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel1] INFO
> org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta
> currentPosition = 217919486, logWriteOrderID = 1358209947323
> 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel2] INFO
> org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta
> currentPosition = 217919486, logWriteOrderID = 1358209947324
>
> Sagar
>
>
> On Mon, Jan 14, 2013 at 3:38 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
>>
>> Hmm, could you try and updated version of Hadoop? CDH3u2 is quite old,
>> I would upgrade to CDH3u5 or CDH 4.1.2.
>>
>> On Mon, Jan 14, 2013 at 3:27 PM, Sagar Mehta <[EMAIL PROTECTED]> wrote:
>> > About the bz2 suggestion, we have a ton of downstream jobs that assume
>> > gzip
>> > compressed files - so it is better to stick to gzip.
>> >
>> > The plan B for us is to have a Oozie step to gzip compress the logs
>> > before
>> > proceeding with downstream Hadoop jobs - but that looks like a hack to
>> > me!!
>> >
>> > Sagar
>> >
>> >
>> > On Mon, Jan 14, 2013 at 3:24 PM, Sagar Mehta <[EMAIL PROTECTED]>
>> > wrote:
>> >>
>> >> hadoop@jobtracker301:/home/hadoop/sagar/debug$ zcat
>> >> collector102.ngpipes.sac.ngmoco.com.1358204406896.gz | wc -l
>> >>
>> >> gzip: collector102.ngpipes.sac.ngmoco.com.1358204406896.gz:
>> >> decompression
>> >> OK, trailing garbage ignored
>> >> 100
>> >>
>> >> This should be about 50,000 events for the 5 min window!!
>> >>
>> >> Sagar
>> >>
>
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/