Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Lock contention in FileChannel


+
Pankaj Gupta 2013-08-13, 23:13
+
Hari Shreedharan 2013-08-13, 23:39
+
Pankaj Gupta 2013-08-14, 00:01
+
Hari Shreedharan 2013-08-14, 00:14
+
Brock Noland 2013-08-14, 00:51
+
Pankaj Gupta 2013-08-14, 02:06
+
Hari Shreedharan 2013-08-14, 02:18
+
Brock Noland 2013-08-14, 02:22
+
Pankaj Gupta 2013-08-14, 02:33
+
Brock Noland 2013-08-14, 02:41
+
Pankaj Gupta 2013-08-14, 02:46
+
Brock Noland 2013-08-14, 02:54
+
Pankaj Gupta 2013-08-14, 02:57
+
Brock Noland 2013-08-14, 03:06
+
Pankaj Gupta 2013-08-14, 03:16
+
Brock Noland 2013-08-14, 03:30
Copy link to this message
-
Re: Lock contention in FileChannel
I tried increasing the dataDirs to 2 and 4 per disk but doesn't seem to
help much. I then replaced the avro sink with null sinks and events are
still filling up in the channel. I tried with both 2 and 4 dataDirs per
disk and null sink, still don't get throughput higher than 1.5 MBps.
On Tue, Aug 13, 2013 at 8:30 PM, Brock Noland <[EMAIL PROTECTED]> wrote:

> Increasing the number of file channels will result in more checkpoints.
> Therefore there will be more io than simply increasing the number of
> dataDirs.  However, this might be a case where it'd be nice to relax the
> file channel data consistency constraints a little to get increased
> throughput. That feature does not exist at present.
>
>
> On Tue, Aug 13, 2013 at 10:16 PM, Pankaj Gupta <[EMAIL PROTECTED]>wrote:
>
>> I did try increasing number of FileChannels. At 2 FileChannels per disk
>> performance seemed to be 25% better. At 4 FileChannels per disk performance
>> dropped to even below 1 FileChannel per disk. I will try increasing the
>> dataDirs tomorrow.
>>
>>
>> On Tue, Aug 13, 2013 at 8:06 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
>>
>>> dataDirs is a comma separated list. Try 3-4 directories and then the
>>> same test.
>>> On Aug 13, 2013 9:58 PM, "Pankaj Gupta" <[EMAIL PROTECTED]> wrote:
>>>
>>>> Both disks were at around 15-25%.
>>>>
>>>>
>>>> On Tue, Aug 13, 2013 at 7:54 PM, Brock Noland <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Gotcha. When you run tge test what is tye disk utilization percentage?
>>>>> Iostat can be used for this.
>>>>> On Aug 13, 2013 9:47 PM, "Pankaj Gupta" <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Those are the boxes we want to collect data from. They run flume and
>>>>>> send data through their avro sinks to the avro source on this box. We are
>>>>>> getting data at a pretty good rate and the problem is in fact that the
>>>>>> events don't drain from the FileChannel fast enough and the channel fill
>>>>>> percentage keeps getting higher.
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 13, 2013 at 7:41 PM, Brock Noland <[EMAIL PROTECTED]>wrote:
>>>>>>
>>>>>>> What is sending the events to the avro source?
>>>>>>> On Aug 13, 2013 9:34 PM, "Pankaj Gupta" <[EMAIL PROTECTED]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Here's the config:
>>>>>>>> # define channels, one for each disk
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> agent1.channels.ch1.type = FILE
>>>>>>>> agent1.channels.ch1.checkpointDir = /flume1/checkpoint
>>>>>>>> agent1.channels.ch1.dataDirs = /flume1/data
>>>>>>>> agent1.channels.ch1.maxFileSize = 375809638400
>>>>>>>> agent1.channels.ch1.capacity = 75000000
>>>>>>>> agent1.channels.ch1.transactionCapacity = 4000
>>>>>>>>
>>>>>>>> agent1.channels.ch2.type = FILE
>>>>>>>> agent1.channels.ch2.checkpointDir = /flume2/checkpoint
>>>>>>>> agent1.channels.ch2.dataDirs = /flume2/data
>>>>>>>> agent1.channels.ch2.maxFileSize = 375809638400
>>>>>>>> agent1.channels.ch2.capacity = 75000000
>>>>>>>> agent1.channels.ch2.transactionCapacity = 4000
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> # Define an Avro source named avroSource1
>>>>>>>> # Each sink can connect to only one channel.
>>>>>>>> # Connect it to channel ch1. Load balance it to 2 avroSinks
>>>>>>>>
>>>>>>>>
>>>>>>>> agent1.sources.avroSource1.channels = ch1
>>>>>>>> agent1.sources.avroSource1.type = avro
>>>>>>>> agent1.sources.avroSource1.bind = 0.0.0.0
>>>>>>>> agent1.sources.avroSource1.port = <port>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> agent1.sinks.avroSink1-1-1.type = avro
>>>>>>>> agent1.sinks.avroSink1-1-1.channel = ch1
>>>>>>>> agent1.sinks.avroSink1-1-1.hostname = <hostname>
>>>>>>>> agent1.sinks.avroSink1-1-1.port = <port>
>>>>>>>> agent1.sinks.avroSink1-1-1.connect-timeout = 300000
>>>>>>>> agent1.sinks.avroSink1-1-1.batch-size = 4000
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> agent1.sinks.avroSink1-2-1.type = avro
>>>>>>>> agent1.sinks.avroSink1-2-1.channel = ch1
>>>>>>>> agent1.sinks.avroSink1-2-1.hostname = <hostname>
>>>>>>>> agent1.sinks.avroSink1-2-1.port = <port>

*P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | [EMAIL PROTECTED]

Pankaj Gupta | Software Engineer

*BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
United States | Canada | United Kingdom | Germany
We're hiring<http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7>
!
+
Pankaj Gupta 2013-08-14, 19:12
+
Pankaj Gupta 2013-08-14, 19:34
+
Hari Shreedharan 2013-08-14, 19:43
+
Pankaj Gupta 2013-08-14, 19:59
+
Pankaj Gupta 2013-08-15, 06:04
+
Pankaj Gupta 2013-08-18, 04:43
+
Hari Shreedharan 2013-08-14, 19:04
+
Pankaj Gupta 2013-08-14, 02:16