Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Netcat source stops processing data

Copy link to this message
Netcat source stops processing data
Rahul Ravindran 2012-11-08, 23:05
  I wanted to perform a load test to get an idea of how we would look to scale flume for our deployment. I have pasted the config file at the source below. I have a netcat source which is listening on a port and have 2 channels, 2 avro sinks consuming the events from the netcat source.

My load generator is a simple C program which is continually sending 20 characters in each message using a socket, and send(). I notice that , initially, a lot of traffic makes it through and then the flume agent appears to stop consuming data(after about 80k messages). This results in the tcp receive and send buffer being full. I understand that the rate at which I am generating traffic may overwhelm flume, but I would expect it to gradually consume data. It does not consume any more messages. I looked through the flume logs and did not see anything there (no stack trace). I ran tcpdump and see that the receive window initially is non-zero but begins to decrease and then goes down to zero, and very slowly opens up to a size of 1 (once in 10 seconds)

Could you help on what may be going on or if there is something wrong with my config?

agent1.channels.ch1.type = MEMORY
agent1.channels.ch1.capacity = 50000
agent1.channels.ch1.transactionCapacity = 5000

agent1.sources.netcat.channels = ch1
agent1.sources.netcat.type= netcat
agent1.sources.netcat.bind =
agent1.sources.netcat.port = 44444

agent1.sinks.avroSink1.type = avro
agent1.sinks.avroSink1.channel = ch1
agent1.sinks.avroSink1.hostname = <remote hostname>
agent1.sinks.avroSink1.port = 4545
agent1.sinks.avroSink1.connect-timeout = 300000
agent1.sinks.avroSink2.type = avro
agent1.sinks.avroSink2.channel = ch1
agent1.sinks.avroSink2.hostname = <remote hostname>
agent1.sinks.avroSink2.port = 4546
agent1.sinks.avroSink2.connect-timeout = 300000

agent1.channels = ch1
agent1.sources = netcat
agent1.sinks = avroSink1 avroSink2 avroSink2