Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Flume netcat source related problems

Copy link to this message
Flume netcat source related problems

I encountered an problem in my scenario with netcat source. Setup is
Host A: Netcat source -file channel -avro sink
Host B: Avro source - file channel - HDFS sink
But to simplify it I have created a single agent with "Netcat Source"
and "file roll sink"*
*It is *:
*Host A: Netcat source - file channel - File_roll sink

1. To simulate the our production scenario. I have created a script
which runs for 15 sec and in the
while loop writes requests netcat source on a given port. For a large
value of the sleep events are
delivered correctly to the destination. But as I reduce the delay events
are given to the source but they
are not delivered to the destination. e.g. I write 9108 records within
15 sec using script and only 1708
got delivered. And I don't get any exception. If it is flow control
related problem then I should have seen
some exception in agent logs. But with file channel and huge disk space,
is it a problem?

***Machine Configuration:*
RAM : 8 GB
JVM : 200 MB
CPU: 2.0 GHz Quad core processor

*Flume Agent Confi**guration*
adServerAgent.sources = netcatSource
adServerAgent.channels = fileChannel memoryChannel
adServerAgent.sinks = fileSink

# For each one of the sources, the type is defined
adServerAgent.sources.netcatSource.type = netcat
adServerAgent.sources.netcatSource.bind =
adServerAgent.sources.netcatSource.port = 55355

# The channel can be defined as follows.
adServerAgent.sources.netcatSource.channels = fileChannel
#adServerAgent.sources.netcatSource.channels = memoryChannel

# Each sink's type must be defined
adServerAgent.sinks.fileSink.type = file_roll
adServerAgent.sinks.fileSink.sink.directory = /root/flume/flume_sink

#Specify the channel the sink should use
#adServerAgent.sinks.fileSink.channel = memoryChannel
adServerAgent.sinks.fileSink.channel = fileChannel

adServerAgent.channels.memoryChannel.type =memory
adServerAgent.channels.memoryChannel.capacity = 100000
adServerAgent.channels.memoryChannel.transactionCapacity = 10000


*Script  snippet being used:*
         local $SIG{ALRM} = sub { die "alarm\n"; };
         alarm $TIMEOUT;
         my $i=0;
         my $str = "";
         my $counter=1;
                         $str = "";
                         for($i=0; $i < $NO_ELE_PER_ROW; $i++)
                                 $str .= $counter."\t";
                         #print $socket "$str\n";
                         $socket->send($str."\n") or die "Didn't send";

                         if($? != 0)
                                 print "Failed for $str \n";
                         print "$str\n";
         alarm 0;
if ($@) {

- Script is working fine as for the very large delay all events are
getting transmitted correctly.*
*- Same problem occurs with memory channel too but with lower values of

**Problem 2:*
-- With this setup I am getting very low throughput i.e. I am able to
transfer only ~ 1 KB/sec data
to the destination file sink. Similar performance was achieved using
HDFS sink.
-- I had tried increasing batch sizes in my original scenario without
much gain in throughput.
-- I had seen using 'tail -F' as source almost 10 times better throughput.
-- Is there any tunable parameter for netcat source?

Please help me in above 2 cases - i)netcat source use  cases
ii) Typical flume's expected throughput with file channel and file/HDFS
sink on the single machine.

Juhani Connolly 2012-09-04, 11:10
Jagadish Bihani 2012-09-05, 06:05
Steve Johnson 2012-09-05, 14:45
Juhani Connolly 2012-09-06, 02:23