Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Flume Ng replaying events when the source is idle


+
Sagar Mehta 2013-02-27, 19:37
+
Roshan Naik 2013-02-28, 22:43
Copy link to this message
-
Re: Flume Ng replaying events when the source is idle
Can also send the flume agent logs? Did you check the contents of the files?  

--
Hari Shreedharan
On Thursday, February 28, 2013 at 2:43 PM, Roshan Naik wrote:

> would you be able to you verify if the same problem can be reproduced by using the memory channel instead in a test setup ?
>
>
> On Wed, Feb 27, 2013 at 11:37 AM, Sagar Mehta <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > Hi Guys,
> >
> > I'm using Flume-Ng and it is working pretty well except for a weird situation which I observed lately. In essence I'm using an exec source for doing  tail -F on a logfile and using two HDFS sinks with a File channel.
> >
> > However I have observed that when the source [ logfile of a jetty based collector] is idle - that is no new events are pushed to the logFile, FlumeNg seems to replay the same set of events.
> >
> > For example collector110 received no events for 2 subsequent hours and below are the corresponding Flume written files at the HDFS sink
> >
> > hadoop@jobtracker301:/home/hadoop/sagar$ hls /ngpipes-raw-logs/2013-02-27/1400/collector110*
> > -rw-r--r--   3 hadoop supergroup        441 2013-02-27 14:20 /ngpipes-raw-logs/2013-02-27/1400/collector110.ngpipes.sac.ngmoco.com.1361974853210.gz
> > -rw-r--r--   3 hadoop supergroup        441 2013-02-27 14:50 /ngpipes-raw-logs/2013-02-27/1400/collector110.ngpipes.sac.ngmoco.com.1361976653432.gz
> >
> > hadoop@jobtracker301:/home/hadoop/sagar$ hls /ngpipes-raw-logs/2013-02-27/1500/collector110*
> > -rw-r--r--   3 hadoop supergroup        441 2013-02-27 15:20 /ngpipes-raw-logs/2013-02-27/1500/collector110.ngpipes.sac.ngmoco.com.1361978454123.gz
> > -rw-r--r--   3 hadoop supergroup        441 2013-02-27 15:50 /ngpipes-raw-logs/2013-02-27/1500/collector110.ngpipes.sac.ngmoco.com.1361980254338.gz
> >
> > hadoop@jobtracker301:/home/hadoop/sagar$ md5sum *
> > c7360ef5c8deaee3ce9f4c92e9d9be63  collector110.ngpipes.sac.ngmoco.com.1361974853210.gz
> > c7360ef5c8deaee3ce9f4c92e9d9be63  collector110.ngpipes.sac.ngmoco.com.1361976653432.gz
> > c7360ef5c8deaee3ce9f4c92e9d9be63  collector110.ngpipes.sac.ngmoco.com.1361978454123.gz
> > c7360ef5c8deaee3ce9f4c92e9d9be63  collector110.ngpipes.sac.ngmoco.com.1361980254338.gz
> >
> >
> > As you can see above the md5sums match.
> >
> > I'm using a File channel which has checkpoints, so I'm not sure what is going on. Btw looks like the difference in timestamps of the two replays is exactly 30 mins.
> >
> > Is this a known bug or am I missing something?
> >
> > Below is my Flume config file
> >
> > smehta@collector110:/opt/flume/conf$ cat hdfs.conf
> > # An hdfs sink to write events to the hdfs on the test cluster
> > # A memory based channel to connect the above source and sink
> >
> > # Name the components on this agent
> > collector110.sources = source1
> > collector110.sinks = sink1 sink2
> > collector110.channels = channel1 channel2
> >
> > # Configure the source
> > collector110.sources.source1.type = exec
> > collector110.sources.source1.command = tail -F /opt/jetty/logFile.log
> >
> > # Configure the interceptors
> > collector110.sources.source1.interceptors = TimestampInterceptor HostInterceptor
> >
> > # We use the Timestamp interceptor to get timestamps of when flume receives events
> > # This is used for figuring out the bucket to which an event goes
> > collector110.sources.source1.interceptors.TimestampInterceptor.type = timestamp
> >
> > # We use the Host interceptor to populate the host header with the fully qualified domain name of the collector.
> > # That way we know which file in the sink respresents which collector.
> > collector110.sources.source1.interceptors.HostInterceptor.type = org.apache.flume.interceptor.HostInterceptor$Builder
> > collector110.sources.source1.interceptors.HostInterceptor.preserveExisting = false
> > collector110.sources.source1.interceptors.HostInterceptor.useIP = false
> > collector110.sources.source1.interceptors.HostInterceptor.hostHeader = host
> >
> >
+
Sagar Mehta 2013-03-04, 22:42
+
Sagar Mehta 2013-03-04, 23:06
+
Hari Shreedharan 2013-03-05, 00:13
+
Mike Percy 2013-03-05, 03:10
+
Sagar Mehta 2013-03-05, 17:53
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB