Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Flume Ng replaying events when the source is idle


Copy link to this message
-
Re: Flume Ng replaying events when the source is idle
Hi Guys,

So I disabled puppet on one of the boxes and yes there have been no replays.

So essentially to summarize -

   - Puppet was re-writing the flume config file [expand a template] every
   30 mins.
   - This caused Flume to reload the configuration.
   - As a result the exec source was getting re-executed which caused the
   replay of events since the logFile had no changes. [From Flume's point of
   view these are still new events :)]

*@Mike - Your suggestion totally makes sense and I'm going to try that now
in a test environment. Do you recommend it for production use?*

Sagar

On Mon, Mar 4, 2013 at 7:10 PM, Mike Percy <[EMAIL PROTECTED]> wrote:

> Sagar,
> Just try "tail -F" on the same file over and over on the command line. It
> will display the last few lines.
>
> If you want to avoid this, try "tail -F -n 0 filename" and you should not
> see this. Every time you reload your configuration file, the specified
> command is re-executed by the source.
>
> Regards,
> Mike
>
>
>
> On Mon, Mar 4, 2013 at 4:13 PM, Hari Shreedharan <
> [EMAIL PROTECTED]> wrote:
>
>>  Flume will reload the configuration file every time it is modified.
>> Since puppet rewrites it, Flume reloads it. The events are probably
>> replayed because of the transactions being incomplete or something like
>> that. File Channel will not replay the events if they have been completely
>> persisted to HDFS and transaction closed. If pupper does not rewrite the
>> config file, do you see this issue?
>>
>> --
>> Hari Shreedharan
>>
>> On Monday, March 4, 2013 at 3:06 PM, Sagar Mehta wrote:
>>
>> I think we found the issue, not sure if this is the root cause but looks
>> highly correlated.
>>
>> So we manage configs using puppet which currently runs in a cron mode
>> with following configuration
>>
>> ## puppetrun Cron Job
>> 20,50 * * * *  root sleep $((RANDOM\%60)) > /dev/null 2>&1; puppet agent
>> --onetime --no-daemonize --logdest syslog > /dev/null 2>&1
>>
>>  *Note - the times at which puppet is run along with the time-stamps in
>> the listing below.*
>>
>> Also after combing through flume logs, we noticed Flume is reloading the
>> configuration after every puppet run
>>
>> sagar@drspock ~/temp $ cat flume.log.2013-03-03 | egrep -i "reloading" |
>> head -5
>> 2013-03-03 00:20:44,174 [conf-file-poller-0] INFO
>>  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider -
>> Reloading configuration file:/opt/flume/conf/hdfs.conf
>> 2013-03-03 00:51:14,374 [conf-file-poller-0] INFO
>>  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider -
>> Reloading configuration file:/opt/flume/conf/hdfs.conf
>> 2013-03-03 01:21:15,072 [conf-file-poller-0] INFO
>>  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider -
>> Reloading configuration file:/opt/flume/conf/hdfs.conf
>> 2013-03-03 01:51:15,778 [conf-file-poller-0] INFO
>>  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider -
>> Reloading configuration file:/opt/flume/conf/hdfs.conf
>> 2013-03-03 02:20:46,481 [conf-file-poller-0] INFO
>>  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider -
>> Reloading configuration file:/opt/flume/conf/hdfs.conf
>>
>> The way we have our current setup, the flume config file
>> namely /opt/flume/conf/hdfs.conf is re-written after every puppet run due
>> to variable interpolation in the template.
>>
>>  *We are still not sure what is causing Flume to reload the config file,
>> and even if the file is reloaded why are the same events getting replayed
>> [the state should be saved somewhere on disk - thats what the file channel
>> is for I thought]*
>>
>> Any pointers/insights appreciated.
>>
>> Sagar
>>
>>
>> On Mon, Mar 4, 2013 at 2:42 PM, Sagar Mehta <[EMAIL PROTECTED]> wrote:
>>
>> Guys,
>>
>> Yes this issue was also seen in the memory channel. In fact when we moved
>> to File based channel, we initially thought  this issue won't occur since
>> it stores check points.
>>
>> Anyways below are all files for collector110 [whose source didn't receive