Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Question about gzip compression when using Flume Ng


Copy link to this message
-
Re: Question about gzip compression when using Flume Ng
Hmm - good point!! Even in the best case say this works, moving to a newer
Hadoop version for the entire 2 production clusters that depend on it [400+
nodes] will need some thorough testing and won't be immediate.

I would have loved for the gzip compression part to have worked more or
less out of the box but for now most likely seems to be a Oozie step to
pre-compress before downstream takes over.

I'm still open to suggestions/insights from this group which has been
super-prompt so far :)

Sagar

On Mon, Jan 14, 2013 at 4:54 PM, Brock Noland <[EMAIL PROTECTED]> wrote:

> Hi,
>
> That's just the file channel. The HDFSEventSink will need a heck of a
> lot more than the just those two jars. To override the version of
> hadoop it will find from the hadoop command you probably want to set
> HADOOP_HOME in flume-env.sh to your custom install.
>
> Also, the client and server should be the same version.
>
> Brock
>
> On Mon, Jan 14, 2013 at 4:43 PM, Sagar Mehta <[EMAIL PROTECTED]> wrote:
> > ok so I dropped in the new hadoop-core jar in /opt/flume/lib [I got some
> > errors about the guava dependencies so put in that jar too]
> >
> > smehta@collector102:/opt/flume/lib$ ls -ltrh | grep -e "hadoop-core" -e
> > "guava"
> > -rw-r--r-- 1 hadoop hadoop 1.5M 2012-11-14 21:49 guava-10.0.1.jar
> > -rw-r--r-- 1 hadoop hadoop 3.7M 2013-01-14 23:50
> > hadoop-core-0.20.2-cdh3u5.jar
> >
> > Now I don't event see the file being created in hdfs and the flume log is
> > happily talking about housekeeping for some file channel checkpoints,
> > updating pointers et al
> >
> > Below is tail of flume log
> >
> > hadoop@collector102:/data/flume_log$ tail -10 flume.log
> > 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO
> > org.apache.flume.channel.file.Log - Updated checkpoint for file:
> > /data/flume_data/channel2/data/log-36 position: 129415524
> logWriteOrderID:
> > 1358209947324
> > 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO
> > org.apache.flume.channel.file.LogFile - Closing RandomReader
> > /data/flume_data/channel2/data/log-34
> > 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO
> > org.apache.flume.channel.file.Log - Updated checkpoint for file:
> > /data/flume_data/channel1/data/log-36 position: 129415524
> logWriteOrderID:
> > 1358209947323
> > 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO
> > org.apache.flume.channel.file.LogFile - Closing RandomReader
> > /data/flume_data/channel1/data/log-34
> > 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel2] INFO
> > org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta
> > currentPosition = 18577138, logWriteOrderID = 1358209947324
> > 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel1] INFO
> > org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta
> > currentPosition = 18577138, logWriteOrderID = 1358209947323
> > 2013-01-15 00:42:10,820 [Log-BackgroundWorker-channel1] INFO
> > org.apache.flume.channel.file.LogFile - Closing RandomReader
> > /data/flume_data/channel1/data/log-35
> > 2013-01-15 00:42:10,821 [Log-BackgroundWorker-channel2] INFO
> > org.apache.flume.channel.file.LogFile - Closing RandomReader
> > /data/flume_data/channel2/data/log-35
> > 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel1] INFO
> > org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta
> > currentPosition = 217919486, logWriteOrderID = 1358209947323
> > 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel2] INFO
> > org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta
> > currentPosition = 217919486, logWriteOrderID = 1358209947324
> >
> > Sagar
> >
> >
> > On Mon, Jan 14, 2013 at 3:38 PM, Brock Noland <[EMAIL PROTECTED]>
> wrote:
> >>
> >> Hmm, could you try and updated version of Hadoop? CDH3u2 is quite old,
> >> I would upgrade to CDH3u5 or CDH 4.1.2.
> >>
> >> On Mon, Jan 14, 2013 at 3:27 PM, Sagar Mehta <[EMAIL PROTECTED]>
> wrote:
> >> > About the bz2 suggestion, we have a ton of downstream jobs that assume
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB