Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Question about gzip compression when using Flume Ng


Copy link to this message
-
Re: Question about gzip compression when using Flume Ng
Sagar Mehta 2013-01-15, 01:03
Hmm - good point!! Even in the best case say this works, moving to a newer
Hadoop version for the entire 2 production clusters that depend on it [400+
nodes] will need some thorough testing and won't be immediate.

I would have loved for the gzip compression part to have worked more or
less out of the box but for now most likely seems to be a Oozie step to
pre-compress before downstream takes over.

I'm still open to suggestions/insights from this group which has been
super-prompt so far :)

Sagar

On Mon, Jan 14, 2013 at 4:54 PM, Brock Noland <[EMAIL PROTECTED]> wrote:

> Hi,
>
> That's just the file channel. The HDFSEventSink will need a heck of a
> lot more than the just those two jars. To override the version of
> hadoop it will find from the hadoop command you probably want to set
> HADOOP_HOME in flume-env.sh to your custom install.
>
> Also, the client and server should be the same version.
>
> Brock
>
> On Mon, Jan 14, 2013 at 4:43 PM, Sagar Mehta <[EMAIL PROTECTED]> wrote:
> > ok so I dropped in the new hadoop-core jar in /opt/flume/lib [I got some
> > errors about the guava dependencies so put in that jar too]
> >
> > smehta@collector102:/opt/flume/lib$ ls -ltrh | grep -e "hadoop-core" -e
> > "guava"
> > -rw-r--r-- 1 hadoop hadoop 1.5M 2012-11-14 21:49 guava-10.0.1.jar
> > -rw-r--r-- 1 hadoop hadoop 3.7M 2013-01-14 23:50
> > hadoop-core-0.20.2-cdh3u5.jar
> >
> > Now I don't event see the file being created in hdfs and the flume log is
> > happily talking about housekeeping for some file channel checkpoints,
> > updating pointers et al
> >
> > Below is tail of flume log
> >
> > hadoop@collector102:/data/flume_log$ tail -10 flume.log
> > 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO
> > org.apache.flume.channel.file.Log - Updated checkpoint for file:
> > /data/flume_data/channel2/data/log-36 position: 129415524
> logWriteOrderID:
> > 1358209947324
> > 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO
> > org.apache.flume.channel.file.LogFile - Closing RandomReader
> > /data/flume_data/channel2/data/log-34
> > 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO
> > org.apache.flume.channel.file.Log - Updated checkpoint for file:
> > /data/flume_data/channel1/data/log-36 position: 129415524
> logWriteOrderID:
> > 1358209947323
> > 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO
> > org.apache.flume.channel.file.LogFile - Closing RandomReader
> > /data/flume_data/channel1/data/log-34
> > 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel2] INFO
> > org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta
> > currentPosition = 18577138, logWriteOrderID = 1358209947324
> > 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel1] INFO
> > org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta
> > currentPosition = 18577138, logWriteOrderID = 1358209947323
> > 2013-01-15 00:42:10,820 [Log-BackgroundWorker-channel1] INFO
> > org.apache.flume.channel.file.LogFile - Closing RandomReader
> > /data/flume_data/channel1/data/log-35
> > 2013-01-15 00:42:10,821 [Log-BackgroundWorker-channel2] INFO
> > org.apache.flume.channel.file.LogFile - Closing RandomReader
> > /data/flume_data/channel2/data/log-35
> > 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel1] INFO
> > org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta
> > currentPosition = 217919486, logWriteOrderID = 1358209947323
> > 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel2] INFO
> > org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta
> > currentPosition = 217919486, logWriteOrderID = 1358209947324
> >
> > Sagar
> >
> >
> > On Mon, Jan 14, 2013 at 3:38 PM, Brock Noland <[EMAIL PROTECTED]>
> wrote:
> >>
> >> Hmm, could you try and updated version of Hadoop? CDH3u2 is quite old,
> >> I would upgrade to CDH3u5 or CDH 4.1.2.
> >>
> >> On Mon, Jan 14, 2013 at 3:27 PM, Sagar Mehta <[EMAIL PROTECTED]>
> wrote:
> >> > About the bz2 suggestion, we have a ton of downstream jobs that assume