Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> flume tail source problem and performance


Copy link to this message
-
Re: flume tail source problem and performance
Yes, you can; Flume plugin framework provides easy way to implement and
apply your own source, deco and sink.

-JS

On 2/4/13 5:07 PM, 锟斤拷锟斤拷锟斤拷 wrote:
> Hi JS锟斤拷
>
> Thank you for your reply. So there is big shortness of collect log
> using flume. can I write my own agent to send logs via thrift protocol
> directly to collector server?
>
> Best Regards,
> Andy Zhou
>
>
> 2013/2/4 Jeong-shik Jang <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
>
>     Hi Andy,
>
>     1. "startFromEnd=true" in your source configuration means data
>     missing can happen at restart in tail side because flume will
>     ignore any data event generated during restart and start at the
>     end all the time.
>     2. With agentSink, data duplication can happen due to ack delay
>     from master or at agent restart.
>
>     I think it is why Flume-NG doesn't support tail any more but does
>     let user handle using script or program; tailing is a tricky job.
>
>     My suggestion is to use agentBEChain in agent tier, and DFO in
>     collector tier; you can still lose some data during failover at
>     failure.
>     To minimize loss and duplication, implementing checkpoint function
>     in tail also can help.
>
>     Having monitoring system to detecting failure is very important as
>     well, so that you can notice failure and do recovering reaction
>     quickly.
>
>     -JS
>
>
>     On 2/4/13 4:27 PM, 锟斤拷锟斤拷锟斤拷 wrote:
>>     Hi JS,
>>     We can't accept agentBESink. Because this logs are important for
>>     data analysis, we can't make any errors of the data. losing data,
>>     duplication are all not acceptable.
>>     one agent's configure is :
>>     tail("H:/game.log", startFromEnd=true) agentSink("hadoop48", 35853)
>>
>>
>>     every time this windows agent restart, it will resend all the
>>     data to collector server.
>>     if some reason we restart the agent node, we can't get the mark
>>     of log where the agent have sent.
>>
>>
>>     2013/1/29 Jeong-shik Jang <[EMAIL PROTECTED]
>>     <mailto:[EMAIL PROTECTED]>>
>>
>>         Hi Andy,
>>
>>         As you set startFromEnd option true, resend might be caused
>>         by DFO mechanism (agentDFOSink); when you restart flume node
>>         in DFO mode, all events in different stages(logged, writing,
>>         sending and so on) rolls back to logged stage, which means
>>         resending and duplication.
>>
>>         And, for better performance, you may want to use agentBESink
>>         instead of agentDFOSink.
>>         I recommend to use agentBEChain for failover in case of
>>         failure in collector tier if you have multiple collectors.
>>
>>         -JS
>>
>>
>>         On 1/29/13 4:29 PM, Alexander Alten-Lorenz wrote:
>>
>>             Hi,
>>
>>             you could use tail -F, but this depends on the external
>>             source. Flume hasn't control about. You can write your
>>             own script and include this.
>>
>>             What's the content of:
>>             /tmp/flume/agent/agent*.*/ - directories? Are sent and
>>             sending clean?
>>
>>             - Alex
>>
>>             On Jan 29, 2013, at 8:24 AM, 锟斤拷锟斤拷锟斤拷 <[EMAIL PROTECTED]
>>             <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>                 hello,
>>                 1. I want to tail a log source and write it to hdfs.
>>                 below is configure锟斤拷
>>                 config [ag1,
>>                 tail("/home/zhouhh/game.log",startFromEnd=true),
>>                 agentDFOSink("hadoop48",35853) ;]
>>                 config [ag2,
>>                 tail("/home/zhouhh/game.log",startFromEnd=true),
>>                 agentDFOSink("hadoop48",35853) ;]
>>                 config [co1, collectorSource( 35853 ), [collectorSink(
>>                 "hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink(
>>                 "hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]]
>>
>>
>>                 I found if I restart the agent node, it will resend
Jeong-shik Jang / [EMAIL PROTECTED]
Gruter, Inc., R&D Team Leader
www.gruter.com
Enjoy Connecting
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB