Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Flume Ng replaying events when the source is idle


+
Sagar Mehta 2013-02-27, 19:37
+
Roshan Naik 2013-02-28, 22:43
Copy link to this message
-
Re: Flume Ng replaying events when the source is idle
Can also send the flume agent logs? Did you check the contents of the files?  

--
Hari Shreedharan
On Thursday, February 28, 2013 at 2:43 PM, Roshan Naik wrote:

> would you be able to you verify if the same problem can be reproduced by using the memory channel instead in a test setup ?
>
>
> On Wed, Feb 27, 2013 at 11:37 AM, Sagar Mehta <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > Hi Guys,
> >
> > I'm using Flume-Ng and it is working pretty well except for a weird situation which I observed lately. In essence I'm using an exec source for doing  tail -F on a logfile and using two HDFS sinks with a File channel.
> >
> > However I have observed that when the source [ logfile of a jetty based collector] is idle - that is no new events are pushed to the logFile, FlumeNg seems to replay the same set of events.
> >
> > For example collector110 received no events for 2 subsequent hours and below are the corresponding Flume written files at the HDFS sink
> >
> > hadoop@jobtracker301:/home/hadoop/sagar$ hls /ngpipes-raw-logs/2013-02-27/1400/collector110*
> > -rw-r--r--   3 hadoop supergroup        441 2013-02-27 14:20 /ngpipes-raw-logs/2013-02-27/1400/collector110.ngpipes.sac.ngmoco.com.1361974853210.gz
> > -rw-r--r--   3 hadoop supergroup        441 2013-02-27 14:50 /ngpipes-raw-logs/2013-02-27/1400/collector110.ngpipes.sac.ngmoco.com.1361976653432.gz
> >
> > hadoop@jobtracker301:/home/hadoop/sagar$ hls /ngpipes-raw-logs/2013-02-27/1500/collector110*
> > -rw-r--r--   3 hadoop supergroup        441 2013-02-27 15:20 /ngpipes-raw-logs/2013-02-27/1500/collector110.ngpipes.sac.ngmoco.com.1361978454123.gz
> > -rw-r--r--   3 hadoop supergroup        441 2013-02-27 15:50 /ngpipes-raw-logs/2013-02-27/1500/collector110.ngpipes.sac.ngmoco.com.1361980254338.gz
> >
> > hadoop@jobtracker301:/home/hadoop/sagar$ md5sum *
> > c7360ef5c8deaee3ce9f4c92e9d9be63  collector110.ngpipes.sac.ngmoco.com.1361974853210.gz
> > c7360ef5c8deaee3ce9f4c92e9d9be63  collector110.ngpipes.sac.ngmoco.com.1361976653432.gz
> > c7360ef5c8deaee3ce9f4c92e9d9be63  collector110.ngpipes.sac.ngmoco.com.1361978454123.gz
> > c7360ef5c8deaee3ce9f4c92e9d9be63  collector110.ngpipes.sac.ngmoco.com.1361980254338.gz
> >
> >
> > As you can see above the md5sums match.
> >
> > I'm using a File channel which has checkpoints, so I'm not sure what is going on. Btw looks like the difference in timestamps of the two replays is exactly 30 mins.
> >
> > Is this a known bug or am I missing something?
> >
> > Below is my Flume config file
> >
> > smehta@collector110:/opt/flume/conf$ cat hdfs.conf
> > # An hdfs sink to write events to the hdfs on the test cluster
> > # A memory based channel to connect the above source and sink
> >
> > # Name the components on this agent
> > collector110.sources = source1
> > collector110.sinks = sink1 sink2
> > collector110.channels = channel1 channel2
> >
> > # Configure the source
> > collector110.sources.source1.type = exec
> > collector110.sources.source1.command = tail -F /opt/jetty/logFile.log
> >
> > # Configure the interceptors
> > collector110.sources.source1.interceptors = TimestampInterceptor HostInterceptor
> >
> > # We use the Timestamp interceptor to get timestamps of when flume receives events
> > # This is used for figuring out the bucket to which an event goes
> > collector110.sources.source1.interceptors.TimestampInterceptor.type = timestamp
> >
> > # We use the Host interceptor to populate the host header with the fully qualified domain name of the collector.
> > # That way we know which file in the sink respresents which collector.
> > collector110.sources.source1.interceptors.HostInterceptor.type = org.apache.flume.interceptor.HostInterceptor$Builder
> > collector110.sources.source1.interceptors.HostInterceptor.preserveExisting = false
> > collector110.sources.source1.interceptors.HostInterceptor.useIP = false
> > collector110.sources.source1.interceptors.HostInterceptor.hostHeader = host
> >
> >
+
Sagar Mehta 2013-03-04, 22:42
+
Sagar Mehta 2013-03-04, 23:06
+
Hari Shreedharan 2013-03-05, 00:13
+
Mike Percy 2013-03-05, 03:10
+
Sagar Mehta 2013-03-05, 17:53