Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Flume netcat source related problems


Copy link to this message
-
Re: Flume netcat source related problems
Jagadish Bihani 2012-09-05, 06:05
Hi Juhani

Thanks for the inputs.
I did following changes:
-- I sent my string event to socket with batches of  1000 & 10000 of
such events.
-- I have also started using DEBUG log level for flume agent.
-- I have also increased max-line-length property of netcat source from
default 512.
But both problems remained. Events got lost without any exception.
And performance also didn't get improve much (from 1 KB/sec now it's 1.3
KB/sec apprx).
Is there anything else to be considered?

Regards,
Jagadish
On 09/04/2012 04:40 PM, Juhani Connolly wrote:
> Hi Jagadish,
>
> NetcatSource doesn't use any batching when receiving events. It writes
> one event at a time, and that translates in the FileChannel to a flush
> to disk, so when you're writing many, your disk just won't keep up.
> One way to improve this is to use separate physical disks for your
> checkpoint/data directories.
>
> TailSource used to have the same problem until we added batching to
> it. By a cursory examination of NetcatSource, it looks to me like you
> can also force some batching by sending multiple lines in each
> socket->send.
>
> As to the first problem with lines going missing, I'm not entirely
> sure as I can't dive deeply into the source right now. I wouldn't be
> surprised if it's some kind of congestion problem and lack of
> logging(or your log levels are just too high, try switching them to
> INFO or DEBUG?) that will be resolved once you get the throughput up.
>
> On 09/04/2012 07:50 PM, Jagadish Bihani wrote:
>> Hi
>>
>> I encountered an problem in my scenario with netcat source. Setup is
>> Host A: Netcat source -file channel -avro sink
>> Host B: Avro source - file channel - HDFS sink
>> But to simplify it I have created a single agent with "Netcat Source"
>> and "file roll sink"*
>> *It is *:
>> *Host A: Netcat source - file channel - File_roll sink
>>
>> *Problem*:
>> 1. To simulate the our production scenario. I have created a script
>> which runs for 15 sec and in the
>> while loop writes requests netcat source on a given port. For a large
>> value of the sleep events are
>> delivered correctly to the destination. But as I reduce the delay
>> events are given to the source but they
>> are not delivered to the destination. e.g. I write 9108 records
>> within 15 sec using script and only 1708
>> got delivered. And I don't get any exception. If it is flow control
>> related problem then I should have seen
>> some exception in agent logs. But with file channel and huge disk
>> space, is it a problem?
>>
>> *Machine Configuration:*
>> RAM : 8 GB
>> JVM : 200 MB
>> CPU: 2.0 GHz Quad core processor
>>
>> *Flume Agent Confi**guration*
>> adServerAgent.sources = netcatSource
>> adServerAgent.channels = fileChannel memoryChannel
>> adServerAgent.sinks = fileSink
>>
>> # For each one of the sources, the type is defined
>> adServerAgent.sources.netcatSource.type = netcat
>> adServerAgent.sources.netcatSource.bind = 10.0.17.231
>> adServerAgent.sources.netcatSource.port = 55355
>>
>> # The channel can be defined as follows.
>> adServerAgent.sources.netcatSource.channels = fileChannel
>> #adServerAgent.sources.netcatSource.channels = memoryChannel
>>
>> # Each sink's type must be defined
>> adServerAgent.sinks.fileSink.type = file_roll
>> adServerAgent.sinks.fileSink.sink.directory = /root/flume/flume_sink
>>
>> #Specify the channel the sink should use
>> #adServerAgent.sinks.fileSink.channel = memoryChannel
>> adServerAgent.sinks.fileSink.channel = fileChannel
>>
>> adServerAgent.channels.memoryChannel.type =memory
>> adServerAgent.channels.memoryChannel.capacity = 100000
>> adServerAgent.channels.memoryChannel.transactionCapacity = 10000
>>
>> adServerAgent.channels.fileChannel.type=file
>> adServerAgent.channels.fileChannel.dataDirs=/root/jagadish/flume_channel1/dataDir3
>> adServerAgent.channels.fileChannel.checkpointDir=/root/jagadish/flume_channel1/checkpointDir3
>>
>> *Script  snippet being used:*
>> ...
>> eval
>> {
>>         local $SIG{ALRM} = sub { die "alarm\n"; };