Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Lock contention in FileChannel


Copy link to this message
-
Re: Lock contention in FileChannel
Gotcha. When you run tge test what is tye disk utilization percentage?
Iostat can be used for this.
On Aug 13, 2013 9:47 PM, "Pankaj Gupta" <[EMAIL PROTECTED]> wrote:

> Those are the boxes we want to collect data from. They run flume and send
> data through their avro sinks to the avro source on this box. We are
> getting data at a pretty good rate and the problem is in fact that the
> events don't drain from the FileChannel fast enough and the channel fill
> percentage keeps getting higher.
>
>
> On Tue, Aug 13, 2013 at 7:41 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
>
>> What is sending the events to the avro source?
>> On Aug 13, 2013 9:34 PM, "Pankaj Gupta" <[EMAIL PROTECTED]> wrote:
>>
>>> Here's the config:
>>> # define channels, one for each disk
>>>
>>>
>>>
>>>
>>> agent1.channels.ch1.type = FILE
>>> agent1.channels.ch1.checkpointDir = /flume1/checkpoint
>>> agent1.channels.ch1.dataDirs = /flume1/data
>>> agent1.channels.ch1.maxFileSize = 375809638400
>>> agent1.channels.ch1.capacity = 75000000
>>> agent1.channels.ch1.transactionCapacity = 4000
>>>
>>> agent1.channels.ch2.type = FILE
>>> agent1.channels.ch2.checkpointDir = /flume2/checkpoint
>>> agent1.channels.ch2.dataDirs = /flume2/data
>>> agent1.channels.ch2.maxFileSize = 375809638400
>>> agent1.channels.ch2.capacity = 75000000
>>> agent1.channels.ch2.transactionCapacity = 4000
>>>
>>>
>>>
>>> # Define an Avro source named avroSource1
>>> # Each sink can connect to only one channel.
>>> # Connect it to channel ch1. Load balance it to 2 avroSinks
>>>
>>>
>>> agent1.sources.avroSource1.channels = ch1
>>> agent1.sources.avroSource1.type = avro
>>> agent1.sources.avroSource1.bind = 0.0.0.0
>>> agent1.sources.avroSource1.port = <port>
>>>
>>>
>>>
>>>
>>> agent1.sinks.avroSink1-1-1.type = avro
>>> agent1.sinks.avroSink1-1-1.channel = ch1
>>> agent1.sinks.avroSink1-1-1.hostname = <hostname>
>>> agent1.sinks.avroSink1-1-1.port = <port>
>>> agent1.sinks.avroSink1-1-1.connect-timeout = 300000
>>> agent1.sinks.avroSink1-1-1.batch-size = 4000
>>>
>>>
>>>
>>>
>>> agent1.sinks.avroSink1-2-1.type = avro
>>> agent1.sinks.avroSink1-2-1.channel = ch1
>>> agent1.sinks.avroSink1-2-1.hostname = <hostname>
>>> agent1.sinks.avroSink1-2-1.port = <port>
>>> agent1.sinks.avroSink1-2-1.connect-timeout = 300000
>>> agent1.sinks.avroSink1-2-1.batch-size = 4000
>>>
>>>
>>>
>>>
>>> agent1.sinks.avroSink1-3-1.type = avro
>>> agent1.sinks.avroSink1-3-1.channel = ch1
>>> agent1.sinks.avroSink1-3-1.hostname = <hostname>
>>> agent1.sinks.avroSink1-3-1.port = <port>
>>> agent1.sinks.avroSink1-3-1.connect-timeout = 300000
>>> agent1.sinks.avroSink1-3-1.batch-size = 4000
>>>
>>>
>>>
>>>
>>> agent1.sinks.avroSink1-4-1.type = avro
>>> agent1.sinks.avroSink1-4-1.channel = ch1
>>> agent1.sinks.avroSink1-4-1.hostname = <hostname>
>>> agent1.sinks.avroSink1-4-1.port = <port>
>>> agent1.sinks.avroSink1-4-1.connect-timeout = 300000
>>> agent1.sinks.avroSink1-4-1.batch-size = 4000
>>>
>>>
>>>
>>> #Add the sink groups; load-balance between each group of sinks which
>>> round robin between different hops
>>> agent1.sinkgroups.group1.sinks = avroSink1-1-1 avroSink1-2-1
>>> avroSink1-3-1 avroSink1-4-1
>>> agent1.sinkgroups.group1.processor.type = load_balance
>>> agent1.sinkgroups.group1.processor.selector = ROUND_ROBIN
>>> agent1.sinkgroups.group1.processor.backoff = true
>>>
>>>
>>> #End of set
>>>
>>> # Define an Avro source named avroSource2
>>> # Each sink can connect to only one channel.
>>> # Connect it to channel ch2. Load balance it to 2 avroSinks
>>>
>>>
>>> agent1.sources.avroSource2.channels = ch2
>>> agent1.sources.avroSource2.type = avro
>>> agent1.sources.avroSource2.bind = 0.0.0.0
>>> agent1.sources.avroSource2.port = <port>
>>>
>>>
>>>
>>>
>>> agent1.sinks.avroSink2-1-1.type = avro
>>> agent1.sinks.avroSink2-1-1.channel = ch2
>>> agent1.sinks.avroSink2-1-1.hostname = <hostname>
>>> agent1.sinks.avroSink2-1-1.port = <port>
>>> agent1.sinks.avroSink2-1-1.connect-timeout = 300000
>>> agent1.sinks.avroSink2-1-1.batch-size = 4000