Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> flume tail source problem and performance


Copy link to this message
-
flume tail source problem and performance
hello,
1. I want to tail a log source and write it to hdfs. below is configure:
config [ag1, tail("/home/zhouhh/game.log",startFromEnd=true),
agentDFOSink("hadoop48",35853) ;]
config [ag2, tail("/home/zhouhh/game.log",startFromEnd=true),
agentDFOSink("hadoop48",35853) ;]
config [co1, collectorSource( 35853 ),  [collectorSink(
"hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink(
"hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]]
I found if I restart the agent node, it will resend the content of game.log
to collector. There are some solutions to send logs from where I haven't
sent before? Or I have to make a mark myself or remove the logs manually
when restart the agent node?

2. I tested performance of flume, and found it's a bit slow.
if I using configure as above, there are only 50MB/minute.
I changed the configure to below:
ag1:tail("/home/zhouhh/game.log",startFromEnd=true)|batch(1000) gzip
agentDFOSink("hadoop48",35853);

config [co1, collectorSource( 35853 ), [collectorSink(
"hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink(
"hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]]

I sent 300MB log, it will spent about 3 minutes, so it's about 100MB/minute.

while I send the log from ag1 to co1 via scp, It's about 30MB/second.

someone give me any ideas?

thanks!

Andy
+
Alexander Alten-Lorenz 2013-01-29, 07:29
+
Jeong-shik Jang 2013-01-29, 07:41
+
周梦想 2013-02-04, 07:27
+
Jeong-shik Jang 2013-02-04, 07:47
+
周梦想 2013-02-04, 08:07
+
Jeong-shik Jang 2013-02-04, 08:13
+
GuoWei 2013-02-04, 11:46
+
周梦想 2013-02-06, 02:47
+
周梦想 2013-02-04, 07:33
+
Alexander Alten-Lorenz 2013-02-04, 07:39
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB