Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Lock contention in FileChannel


Copy link to this message
-
Re: Lock contention in FileChannel
I did try increasing number of FileChannels. At 2 FileChannels per disk
performance seemed to be 25% better. At 4 FileChannels per disk performance
dropped to even below 1 FileChannel per disk. I will try increasing the
dataDirs tomorrow.
On Tue, Aug 13, 2013 at 8:06 PM, Brock Noland <[EMAIL PROTECTED]> wrote:

> dataDirs is a comma separated list. Try 3-4 directories and then the same
> test.
> On Aug 13, 2013 9:58 PM, "Pankaj Gupta" <[EMAIL PROTECTED]> wrote:
>
>> Both disks were at around 15-25%.
>>
>>
>> On Tue, Aug 13, 2013 at 7:54 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
>>
>>> Gotcha. When you run tge test what is tye disk utilization percentage?
>>> Iostat can be used for this.
>>> On Aug 13, 2013 9:47 PM, "Pankaj Gupta" <[EMAIL PROTECTED]> wrote:
>>>
>>>> Those are the boxes we want to collect data from. They run flume and
>>>> send data through their avro sinks to the avro source on this box. We are
>>>> getting data at a pretty good rate and the problem is in fact that the
>>>> events don't drain from the FileChannel fast enough and the channel fill
>>>> percentage keeps getting higher.
>>>>
>>>>
>>>> On Tue, Aug 13, 2013 at 7:41 PM, Brock Noland <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> What is sending the events to the avro source?
>>>>> On Aug 13, 2013 9:34 PM, "Pankaj Gupta" <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Here's the config:
>>>>>> # define channels, one for each disk
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> agent1.channels.ch1.type = FILE
>>>>>> agent1.channels.ch1.checkpointDir = /flume1/checkpoint
>>>>>> agent1.channels.ch1.dataDirs = /flume1/data
>>>>>> agent1.channels.ch1.maxFileSize = 375809638400
>>>>>> agent1.channels.ch1.capacity = 75000000
>>>>>> agent1.channels.ch1.transactionCapacity = 4000
>>>>>>
>>>>>> agent1.channels.ch2.type = FILE
>>>>>> agent1.channels.ch2.checkpointDir = /flume2/checkpoint
>>>>>> agent1.channels.ch2.dataDirs = /flume2/data
>>>>>> agent1.channels.ch2.maxFileSize = 375809638400
>>>>>> agent1.channels.ch2.capacity = 75000000
>>>>>> agent1.channels.ch2.transactionCapacity = 4000
>>>>>>
>>>>>>
>>>>>>
>>>>>> # Define an Avro source named avroSource1
>>>>>> # Each sink can connect to only one channel.
>>>>>> # Connect it to channel ch1. Load balance it to 2 avroSinks
>>>>>>
>>>>>>
>>>>>> agent1.sources.avroSource1.channels = ch1
>>>>>> agent1.sources.avroSource1.type = avro
>>>>>> agent1.sources.avroSource1.bind = 0.0.0.0
>>>>>> agent1.sources.avroSource1.port = <port>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> agent1.sinks.avroSink1-1-1.type = avro
>>>>>> agent1.sinks.avroSink1-1-1.channel = ch1
>>>>>> agent1.sinks.avroSink1-1-1.hostname = <hostname>
>>>>>> agent1.sinks.avroSink1-1-1.port = <port>
>>>>>> agent1.sinks.avroSink1-1-1.connect-timeout = 300000
>>>>>> agent1.sinks.avroSink1-1-1.batch-size = 4000
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> agent1.sinks.avroSink1-2-1.type = avro
>>>>>> agent1.sinks.avroSink1-2-1.channel = ch1
>>>>>> agent1.sinks.avroSink1-2-1.hostname = <hostname>
>>>>>> agent1.sinks.avroSink1-2-1.port = <port>
>>>>>> agent1.sinks.avroSink1-2-1.connect-timeout = 300000
>>>>>> agent1.sinks.avroSink1-2-1.batch-size = 4000
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> agent1.sinks.avroSink1-3-1.type = avro
>>>>>> agent1.sinks.avroSink1-3-1.channel = ch1
>>>>>> agent1.sinks.avroSink1-3-1.hostname = <hostname>
>>>>>> agent1.sinks.avroSink1-3-1.port = <port>
>>>>>> agent1.sinks.avroSink1-3-1.connect-timeout = 300000
>>>>>> agent1.sinks.avroSink1-3-1.batch-size = 4000
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> agent1.sinks.avroSink1-4-1.type = avro
>>>>>> agent1.sinks.avroSink1-4-1.channel = ch1
>>>>>> agent1.sinks.avroSink1-4-1.hostname = <hostname>
>>>>>> agent1.sinks.avroSink1-4-1.port = <port>
>>>>>> agent1.sinks.avroSink1-4-1.connect-timeout = 300000
>>>>>> agent1.sinks.avroSink1-4-1.batch-size = 4000
>>>>>>
>>>>>>
>>>>>>
>>>>>> #Add the sink groups; load-balance between each group of sinks which
>>>>>> round robin between different hops
>>>>>> agent1.sinkgroups.group1.sinks = avroSink1-1-1 avroSink1-2-1
*P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | [EMAIL PROTECTED]

Pankaj Gupta | Software Engineer

*BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
United States | Canada | United Kingdom | Germany
We're hiring<http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7>
!
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB