Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> how to implement a tail or tailDir of flume-ng on windows?


Copy link to this message
-
Re: how to implement a tail or tailDir of flume-ng on windows?
Hi Juhani,
I wrote a python script tail.py as below:
import time, os
import sys
#Set the filename and open the file
#filename = 'security_log'

def tail_f(file):
  interval = 1.0

  while True:
    where = file.tell()
    line = file.readline()
    if not line:
      time.sleep(interval)
      file.seek(where)
    else:
      yield line
for line in tail_f(open(sys.argv[1])):
  print line,

tail.bat:
C:\Python27\python.exe D:\apache-flume-1.3.1-bin\tail.py d:\data.log

I changed conf file to :
agent1.sources.userlogsrc.type = exec
agent1.sources.userlogsrc.command "D:\\apache-flume-1.3.1-bin\\bin\\tail.bat"

this node tail the file, sink is avro, send to another node source is avro.
I run my flume.bat, it gives nothing error, I can see the connection is ok,
but does not send any data to flume-ng.

if i change config file to :
agent1.sources.userlogsrc.command = "C:\\Python27\\python.exe
D:\\apache-flume-1.3.1-bin\\tail.py d:\\data.log"

run the flume.bat,it report error:
2013-02-21 15:21:08,622 (pool-4-thread-1) [ERROR -
org.apache.flume.source.ExecS
ource$ExecRunnable.run(ExecSource.java:284)] Failed while running command:
"C:\P
ython27\python.exe D:\apache-flume-1.3.1-bin\tail.py d:\data.log"
java.io.IOException: Cannot run program ""C:\Python27\python.exe":
CreateProcess
 error=2, ?????????
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
        at
org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:2
59)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
1)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: CreateProcess error=2, ?????????
        at java.lang.ProcessImpl.create(Native Method)
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:81)
        at java.lang.ProcessImpl.start(ProcessImpl.java:30)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
        ... 7 more
2013-02-21 15:21:08,651 (pool-4-thread-1) [INFO -
org.apache.flume.source.ExecSo
urce$ExecRunnable.run(ExecSource.java:307)] Command
["C:\Python27\python.exe D:\
apache-flume-1.3.1-bin\tail.py d:\data.log"] exited with -1073741824

I don't know why the exec source can't run python program?

Thanks,
Andy
2013/2/21 Juhani Connolly <[EMAIL PROTECTED]>

> You'd want to just periodically stat the file to be tailed, checking for
> change in last modified/size, and read the difference out of it. You could
> always download the source for tail itself and see how it does it:
> http://git.savannah.gnu.org/**cgit/coreutils.git/tree/src/**tail.c<http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c>
>
> If you're going to write this to feed data to flume you're better off
> having it send data over thrift to flume so you can resend it on failures.
>
>
> On 02/21/2013 12:37 PM, 周梦想 wrote:
>
>> hello,
>>
>> there isn't tail or tailDir source of flume-ng.
>> exec source can run tail command on linux.
>> but there is not a tail command on windows. So I have to write some code
>> to do the same work.
>> I want to read a file and if there is new lines of a file, it sends the
>> lines to flume-ng.
>>
>> some one give me some advice?
>>
>> Thanks,
>> Andy
>>
>>
>