Flume, mail # user - flume tail source problem and performance

周梦想 2013-01-29, 07:24
1. I want to tail a log source and write it to hdfs. below is configure:
config [ag1, tail("/home/zhouhh/game.log",startFromEnd=true),
agentDFOSink("hadoop48",35853) ;]
config [ag2, tail("/home/zhouhh/game.log",startFromEnd=true),
agentDFOSink("hadoop48",35853) ;]
config [co1, collectorSource( 35853 ),  [collectorSink(
I found if I restart the agent node, it will resend the content of game.log
to collector. There are some solutions to send logs from where I haven't
sent before? Or I have to make a mark myself or remove the logs manually
when restart the agent node?

2. I tested performance of flume, and found it's a bit slow.
if I using configure as above, there are only 50MB/minute.
I changed the configure to below:
ag1:tail("/home/zhouhh/game.log",startFromEnd=true)|batch(1000) gzip

config [co1, collectorSource( 35853 ), [collectorSink(

I sent 300MB log, it will spent about 3 minutes, so it's about 100MB/minute.

while I send the log from ag1 to co1 via scp, It's about 30MB/second.

someone give me any ideas?


Alexander Alten-Lorenz 2013-01-29, 07:29
Jeong-shik Jang 2013-01-29, 07:41
周梦想 2013-02-04, 07:27
Jeong-shik Jang 2013-02-04, 07:47
周梦想 2013-02-04, 08:07
Jeong-shik Jang 2013-02-04, 08:13
GuoWei 2013-02-04, 11:46
周梦想 2013-02-06, 02:47
周梦想 2013-02-04, 07:33
Alexander Alten-Lorenz 2013-02-04, 07:39