Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Flume netcat source related problems


+
Jagadish Bihani 2012-09-04, 10:50
Copy link to this message
-
Re: Flume netcat source related problems
Juhani Connolly 2012-09-04, 11:10
Hi Jagadish,

NetcatSource doesn't use any batching when receiving events. It writes
one event at a time, and that translates in the FileChannel to a flush
to disk, so when you're writing many, your disk just won't keep up. One
way to improve this is to use separate physical disks for your
checkpoint/data directories.

TailSource used to have the same problem until we added batching to it.
By a cursory examination of NetcatSource, it looks to me like you can
also force some batching by sending multiple lines in each socket->send.

As to the first problem with lines going missing, I'm not entirely sure
as I can't dive deeply into the source right now. I wouldn't be
surprised if it's some kind of congestion problem and lack of logging(or
your log levels are just too high, try switching them to INFO or DEBUG?)
that will be resolved once you get the throughput up.

On 09/04/2012 07:50 PM, Jagadish Bihani wrote:
> Hi
>
> I encountered an problem in my scenario with netcat source. Setup is
> Host A: Netcat source -file channel -avro sink
> Host B: Avro source - file channel - HDFS sink
> But to simplify it I have created a single agent with "Netcat Source"
> and "file roll sink"*
> *It is *:
> *Host A: Netcat source - file channel - File_roll sink
>
> *Problem*:
> 1. To simulate the our production scenario. I have created a script
> which runs for 15 sec and in the
> while loop writes requests netcat source on a given port. For a large
> value of the sleep events are
> delivered correctly to the destination. But as I reduce the delay
> events are given to the source but they
> are not delivered to the destination. e.g. I write 9108 records within
> 15 sec using script and only 1708
> got delivered. And I don't get any exception. If it is flow control
> related problem then I should have seen
> some exception in agent logs. But with file channel and huge disk
> space, is it a problem?
>
> *Machine Configuration:*
> RAM : 8 GB
> JVM : 200 MB
> CPU: 2.0 GHz Quad core processor
>
> *Flume Agent Confi**guration*
> adServerAgent.sources = netcatSource
> adServerAgent.channels = fileChannel memoryChannel
> adServerAgent.sinks = fileSink
>
> # For each one of the sources, the type is defined
> adServerAgent.sources.netcatSource.type = netcat
> adServerAgent.sources.netcatSource.bind = 10.0.17.231
> adServerAgent.sources.netcatSource.port = 55355
>
> # The channel can be defined as follows.
> adServerAgent.sources.netcatSource.channels = fileChannel
> #adServerAgent.sources.netcatSource.channels = memoryChannel
>
> # Each sink's type must be defined
> adServerAgent.sinks.fileSink.type = file_roll
> adServerAgent.sinks.fileSink.sink.directory = /root/flume/flume_sink
>
> #Specify the channel the sink should use
> #adServerAgent.sinks.fileSink.channel = memoryChannel
> adServerAgent.sinks.fileSink.channel = fileChannel
>
> adServerAgent.channels.memoryChannel.type =memory
> adServerAgent.channels.memoryChannel.capacity = 100000
> adServerAgent.channels.memoryChannel.transactionCapacity = 10000
>
> adServerAgent.channels.fileChannel.type=file
> adServerAgent.channels.fileChannel.dataDirs=/root/jagadish/flume_channel1/dataDir3
> adServerAgent.channels.fileChannel.checkpointDir=/root/jagadish/flume_channel1/checkpointDir3
>
> *Script  snippet being used:*
> ...
> eval
> {
>         local $SIG{ALRM} = sub { die "alarm\n"; };
>         alarm $TIMEOUT;
>         my $i=0;
>         my $str = "";
>         my $counter=1;
>         while(1)
>         {
>                         $str = "";
>                         for($i=0; $i < $NO_ELE_PER_ROW; $i++)
>                         {
>                                 $str .= $counter."\t";
>                                 $counter++;
>                         }
>                         chop($str);
>                         #print $socket "$str\n";
>                         $socket->send($str."\n") or die "Didn't send";
>
>                         if($? != 0)
>                         {
>                                 print "Failed for $str \n";
+
Jagadish Bihani 2012-09-05, 06:05
+
Steve Johnson 2012-09-05, 14:45
+
Juhani Connolly 2012-09-06, 02:23