Wang, Yongkun | Yongkun |... 2013-08-19, 02:29
Brock Noland 2013-08-19, 13:08
Wang, Yongkun | Yongkun |... 2013-08-20, 08:43
Brock Noland 2013-08-20, 14:58
Paul Chavez 2013-08-20, 16:15
Wang, Yongkun | Yongkun |... 2013-08-23, 05:26
Paul Chavez 2013-08-23, 21:26
I've setup something similar with the spooling directory source. I have a script that is scheduled on the app server to create an incremental file every minute and then drop the incremental file in the spool directory for processing. The use case is web logs that roll over daily, but we want events 'near' real time. We didn't want to use the exec source as that gives no delivery guarantee, at least with a spooling source if the flume agent stops processing the incremental files stay in the spool dir until it's back up.
Hope that helps,
From: Wang, Yongkun | Yongkun | BDD [mailto:[EMAIL PROTECTED]]
Sent: Sunday, August 18, 2013 7:30 PM
To: [EMAIL PROTECTED]
Subject: sleep() in script doesn't work when called by exec Source
I am testing with apache-flume-1.4.0-bin.
I made a naive python script for exec source to do throttling by calling sleep() function.
But the sleep() doesn't work when called by exec source.
Any ideas about this or do you have some simply solution for throttling instead of a custom source?
agent.sources = src1
agent.sources.src1.type = exec
agent.sources.src1.command = read-file-throttle.py
with open("apache.log") as infile:
for line in infile:
line = line.strip()
count += 1
if count % 50000 == 0:
now_time = time.time()
diff = now_time - pre_time
if diff < 10:
#print "sleeping %s seconds ..." % (diff)
pre_time = now_time
Thank you very much.
Wang, Yongkun | Yongkun |... 2013-08-20, 08:44