Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Flume Ng replaying events when the source is idle


Copy link to this message
-
Re: Flume Ng replaying events when the source is idle
Flume will reload the configuration file every time it is modified. Since puppet rewrites it, Flume reloads it. The events are probably replayed because of the transactions being incomplete or something like that. File Channel will not replay the events if they have been completely persisted to HDFS and transaction closed. If pupper does not rewrite the config file, do you see this issue?

--
Hari Shreedharan
On Monday, March 4, 2013 at 3:06 PM, Sagar Mehta wrote:

> I think we found the issue, not sure if this is the root cause but looks highly correlated.
>
> So we manage configs using puppet which currently runs in a cron mode with following configuration
>
> ## puppetrun Cron Job
> 20,50 * * * *  root sleep $((RANDOM\%60)) > /dev/null 2>&1; puppet agent --onetime --no-daemonize --logdest syslog > /dev/null 2>&1
>
> Note - the times at which puppet is run along with the time-stamps in the listing below.
>
> Also after combing through flume logs, we noticed Flume is reloading the configuration after every puppet run
>
> sagar@drspock ~/temp $ cat flume.log.2013-03-03 | egrep -i "reloading" | head -5
> 2013-03-03 00:20:44,174 [conf-file-poller-0] INFO  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider - Reloading configuration file:/opt/flume/conf/hdfs.conf
> 2013-03-03 00:51:14,374 [conf-file-poller-0] INFO  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider - Reloading configuration file:/opt/flume/conf/hdfs.conf
> 2013-03-03 01:21:15,072 [conf-file-poller-0] INFO  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider - Reloading configuration file:/opt/flume/conf/hdfs.conf
> 2013-03-03 01:51:15,778 [conf-file-poller-0] INFO  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider - Reloading configuration file:/opt/flume/conf/hdfs.conf
> 2013-03-03 02:20:46,481 [conf-file-poller-0] INFO  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider - Reloading configuration file:/opt/flume/conf/hdfs.conf
>
>
> The way we have our current setup, the flume config file namely /opt/flume/conf/hdfs.conf is re-written after every puppet run due to variable interpolation in the template.
>
> We are still not sure what is causing Flume to reload the config file, and even if the file is reloaded why are the same events getting replayed [the state should be saved somewhere on disk - thats what the file channel is for I thought]
>
> Any pointers/insights appreciated.
>
> Sagar
>
>
> On Mon, Mar 4, 2013 at 2:42 PM, Sagar Mehta <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > Guys,
> >
> > Yes this issue was also seen in the memory channel. In fact when we moved to File based channel, we initially thought  this issue won't occur since it stores check points.
> >
> > Anyways below are all files for collector110 [whose source didn't receive any events] and you can see all the replays below. I have attached the corresponding flume log file for the same day.
> >
> > hadoop@jobtracker301:/home/smehta$ hls /ngpipes-raw-logs/2013-03-03/*/collector110* |  head -5
> > -rw-r--r--   3 hadoop supergroup       1594 2013-03-03 00:20 /ngpipes-raw-logs/2013-03-03/0000/collector110.ngpipes.sac.ngmoco.com.1362270044367.gz
> > -rw-r--r--   3 hadoop supergroup       1594 2013-03-03 00:51 /ngpipes-raw-logs/2013-03-03/0000/collector110.ngpipes.sac.ngmoco.com.1362271875065.gz
> > -rw-r--r--   3 hadoop supergroup       1594 2013-03-03 01:21 /ngpipes-raw-logs/2013-03-03/0100/collector110.ngpipes.sac.ngmoco.com.1362273675770.gz
> > -rw-r--r--   3 hadoop supergroup       1594 2013-03-03 01:51 /ngpipes-raw-logs/2013-03-03/0100/collector110.ngpipes.sac.ngmoco.com.1362275476474.gz
> > -rw-r--r--   3 hadoop supergroup       1594 2013-03-03 02:20 /ngpipes-raw-logs/2013-03-03/0200/collector110.ngpipes.sac.ngmoco.com.1362277246704.gz
> >
> >
> > Also in the attached flume log, you can see the replay stuff I'm talking about - Please note the source received no events during this time.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB