Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> flume tail source problem and performance


Copy link to this message
-
Re: flume tail source problem and performance
Yes, you can; Flume plugin framework provides easy way to implement and
apply your own source, deco and sink.

-JS

On 2/4/13 5:07 PM, 锟斤拷锟斤拷锟斤拷 wrote:
> Hi JS锟斤拷
>
> Thank you for your reply. So there is big shortness of collect log
> using flume. can I write my own agent to send logs via thrift protocol
> directly to collector server?
>
> Best Regards,
> Andy Zhou
>
>
> 2013/2/4 Jeong-shik Jang <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
>
>     Hi Andy,
>
>     1. "startFromEnd=true" in your source configuration means data
>     missing can happen at restart in tail side because flume will
>     ignore any data event generated during restart and start at the
>     end all the time.
>     2. With agentSink, data duplication can happen due to ack delay
>     from master or at agent restart.
>
>     I think it is why Flume-NG doesn't support tail any more but does
>     let user handle using script or program; tailing is a tricky job.
>
>     My suggestion is to use agentBEChain in agent tier, and DFO in
>     collector tier; you can still lose some data during failover at
>     failure.
>     To minimize loss and duplication, implementing checkpoint function
>     in tail also can help.
>
>     Having monitoring system to detecting failure is very important as
>     well, so that you can notice failure and do recovering reaction
>     quickly.
>
>     -JS
>
>
>     On 2/4/13 4:27 PM, 锟斤拷锟斤拷锟斤拷 wrote:
>>     Hi JS,
>>     We can't accept agentBESink. Because this logs are important for
>>     data analysis, we can't make any errors of the data. losing data,
>>     duplication are all not acceptable.
>>     one agent's configure is :
>>     tail("H:/game.log", startFromEnd=true) agentSink("hadoop48", 35853)
>>
>>
>>     every time this windows agent restart, it will resend all the
>>     data to collector server.
>>     if some reason we restart the agent node, we can't get the mark
>>     of log where the agent have sent.
>>
>>
>>     2013/1/29 Jeong-shik Jang <[EMAIL PROTECTED]
>>     <mailto:[EMAIL PROTECTED]>>
>>
>>         Hi Andy,
>>
>>         As you set startFromEnd option true, resend might be caused
>>         by DFO mechanism (agentDFOSink); when you restart flume node
>>         in DFO mode, all events in different stages(logged, writing,
>>         sending and so on) rolls back to logged stage, which means
>>         resending and duplication.
>>
>>         And, for better performance, you may want to use agentBESink
>>         instead of agentDFOSink.
>>         I recommend to use agentBEChain for failover in case of
>>         failure in collector tier if you have multiple collectors.
>>
>>         -JS
>>
>>
>>         On 1/29/13 4:29 PM, Alexander Alten-Lorenz wrote:
>>
>>             Hi,
>>
>>             you could use tail -F, but this depends on the external
>>             source. Flume hasn't control about. You can write your
>>             own script and include this.
>>
>>             What's the content of:
>>             /tmp/flume/agent/agent*.*/ - directories? Are sent and
>>             sending clean?
>>
>>             - Alex
>>
>>             On Jan 29, 2013, at 8:24 AM, 锟斤拷锟斤拷锟斤拷 <[EMAIL PROTECTED]
>>             <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>                 hello,
>>                 1. I want to tail a log source and write it to hdfs.
>>                 below is configure锟斤拷
>>                 config [ag1,
>>                 tail("/home/zhouhh/game.log",startFromEnd=true),
>>                 agentDFOSink("hadoop48",35853) ;]
>>                 config [ag2,
>>                 tail("/home/zhouhh/game.log",startFromEnd=true),
>>                 agentDFOSink("hadoop48",35853) ;]
>>                 config [co1, collectorSource( 35853 ), [collectorSink(
>>                 "hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink(
>>                 "hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]]
>>
>>
>>                 I found if I restart the agent node, it will resend
Jeong-shik Jang / [EMAIL PROTECTED]
Gruter, Inc., R&D Team Leader
www.gruter.com
Enjoy Connecting