Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Dupes


I'm seeing a LOT of random dupes in some of my log files....

This is pretty consistent in one in particular that's being tail'ed
averages ~20M per day, everyday.  On the only sink (FILE_ROLL) the
resulting 24hour log is 55M.  Just some quick counts grep'ing a random time
(ie 07:23) shows the sink log with a dozen or so more lines with the same
timestamp than the source has every minute.

But this is happening like clockwork everyday for the last couple months
when I started using Flume on this box.

I did check that there wasn't another source from this or another server
sending to the same port...and the entries of the log file look proper for
that app.

The logs are not rolling at the same time on the source/sink and I've not
yet taken the time to set up copies of each begining and ending at the same
times and run a diff against them, but a preliminary 'eyeball diff' just
shows dupes.  I will note on the source a line with the exact same text may
appear more than once as the logging mechanism does not log more precise
then hour/minute.

All in all, dupes are better than drops, but is there anything in
particular I should look for to try to find the cause of and eliminate this?
Thanks in advance for any thoughts,
Dave
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB