Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Flume Ng replaying events when the source is idle


+
Sagar Mehta 2013-02-27, 19:37
Copy link to this message
-
Re: Flume Ng replaying events when the source is idle
would you be able to you verify if the same problem can be reproduced by
using the memory channel instead in a test setup ?
On Wed, Feb 27, 2013 at 11:37 AM, Sagar Mehta <[EMAIL PROTECTED]> wrote:

> Hi Guys,
>
> I'm using Flume-Ng and it is working pretty well except for a weird
> situation which I observed lately. In essence I'm using an exec source for
> doing  tail -F on a logfile and using two HDFS sinks with a File channel.
>
> However I have observed that when the source [ logfile of a jetty based
> collector] is idle - that is no new events are pushed to the logFile,
> FlumeNg seems to replay the same set of events.
>
> For example collector110 received no events for 2 subsequent hours and
> below are the corresponding Flume written files at the HDFS sink
>
> hadoop@jobtracker301:/home/hadoop/sagar$ hls
> /ngpipes-raw-logs/2013-02-27/1400/collector110*
> -rw-r--r--   3 hadoop supergroup        441 2013-02-27 14:20
> /ngpipes-raw-logs/2013-02-27/1400/collector110.ngpipes.sac.ngmoco.com.1361974853210.gz
> -rw-r--r--   3 hadoop supergroup        441 2013-02-27 14:50
> /ngpipes-raw-logs/2013-02-27/1400/collector110.ngpipes.sac.ngmoco.com.1361976653432.gz
>
> hadoop@jobtracker301:/home/hadoop/sagar$ hls
> /ngpipes-raw-logs/2013-02-27/1500/collector110*
> -rw-r--r--   3 hadoop supergroup        441 2013-02-27 15:20
> /ngpipes-raw-logs/2013-02-27/1500/collector110.ngpipes.sac.ngmoco.com.1361978454123.gz
> -rw-r--r--   3 hadoop supergroup        441 2013-02-27 15:50
> /ngpipes-raw-logs/2013-02-27/1500/collector110.ngpipes.sac.ngmoco.com.1361980254338.gz
>
> hadoop@jobtracker301:/home/hadoop/sagar$ md5sum *
> c7360ef5c8deaee3ce9f4c92e9d9be63
>  collector110.ngpipes.sac.ngmoco.com.1361974853210.gz
> c7360ef5c8deaee3ce9f4c92e9d9be63
>  collector110.ngpipes.sac.ngmoco.com.1361976653432.gz
> c7360ef5c8deaee3ce9f4c92e9d9be63
>  collector110.ngpipes.sac.ngmoco.com.1361978454123.gz
> c7360ef5c8deaee3ce9f4c92e9d9be63
>  collector110.ngpipes.sac.ngmoco.com.1361980254338.gz
>
> As you can see above the md5sums match.
>
> I'm using a File channel which has checkpoints, so I'm not sure what is
> going on. Btw looks like the difference in timestamps of the two replays is
> exactly 30 mins.
>
> *Is this a known bug or am I missing something?*
> *
> *
> *Below is my Flume config file*
>
> smehta@collector110:/opt/flume/conf$ cat hdfs.conf
> # An hdfs sink to write events to the hdfs on the test cluster
> # A memory based channel to connect the above source and sink
>
> # Name the components on this agent
> collector110.sources = source1
> collector110.sinks = sink1 sink2
> collector110.channels = channel1 channel2
>
> # Configure the source
> collector110.sources.source1.type = exec
> collector110.sources.source1.command = tail -F /opt/jetty/logFile.log
>
> # Configure the interceptors
> collector110.sources.source1.interceptors = TimestampInterceptor
> HostInterceptor
>
> # We use the Timestamp interceptor to get timestamps of when flume
> receives events
> # This is used for figuring out the bucket to which an event goes
> collector110.sources.source1.interceptors.TimestampInterceptor.type > timestamp
>
> # We use the Host interceptor to populate the host header with the fully
> qualified domain name of the collector.
> # That way we know which file in the sink respresents which collector.
> collector110.sources.source1.interceptors.HostInterceptor.type > org.apache.flume.interceptor.HostInterceptor$Builder
> collector110.sources.source1.interceptors.HostInterceptor.preserveExisting
> = false
> collector110.sources.source1.interceptors.HostInterceptor.useIP = false
> collector110.sources.source1.interceptors.HostInterceptor.hostHeader = host
>
>
> # Configure the sink
>
> collector110.sinks.sink1.type = hdfs
>
> # Configure the bucketing
> collector110.sinks.sink1.hdfs.path=hdfs://
> namenode3001.ngpipes.milp.ngmoco.com:9000/ngpipes-raw-logs/%Y-%m-%d/%H00
>
> # Prefix the file with the source so that we know where the events in the
> file came from
+
Hari Shreedharan 2013-02-28, 22:59
+
Sagar Mehta 2013-03-04, 22:42
+
Sagar Mehta 2013-03-04, 23:06
+
Hari Shreedharan 2013-03-05, 00:13
+
Mike Percy 2013-03-05, 03:10
+
Sagar Mehta 2013-03-05, 17:53