Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Lock contention in FileChannel


+
Pankaj Gupta 2013-08-13, 23:13
+
Hari Shreedharan 2013-08-13, 23:39
+
Pankaj Gupta 2013-08-14, 00:01
+
Hari Shreedharan 2013-08-14, 00:14
+
Brock Noland 2013-08-14, 00:51
+
Pankaj Gupta 2013-08-14, 02:06
+
Hari Shreedharan 2013-08-14, 02:18
Copy link to this message
-
Re: Lock contention in FileChannel
Can you share your conf file?
On Aug 13, 2013 9:19 PM, "Hari Shreedharan" <[EMAIL PROTECTED]>
wrote:

>  Even though the writes are done per batch, they don't go to disk
> rightaway - commits are the only ones which actually cause an fsync - which
> is when writes actually go to disk.
>
>
> Thanks,
> Hari
>
> On Tuesday, August 13, 2013 at 7:06 PM, Pankaj Gupta wrote:
>
> Looking at the code it seems like the lock and the i/o is done per event
> and not for a batch. Is that correct? If that is the case then it seems
> like there is a lot of overhead per event. The throughput I'm seeing is 1 -
> 1.5 MBps per disk which is way below the sequential read/write capacity of
> the disk which is easily over 50MBps. Adding more sinks doesn't help, they
> just block waiting for the queue to become free. CPU usage is 20%, there is
> enough RAM for page cache so that no read is going to disk. The queue seems
> to be the bottleneck. What is the throughput I should expect per disk?
>
>
> On Tue, Aug 13, 2013 at 5:51 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
>
> The lock is per file. Adding more directories to the channel will cause
> more files to be created. Of course you'll need additional disks behind
> those directories to see any performance increase.
>
>
> On Tue, Aug 13, 2013 at 7:14 PM, Hari Shreedharan <
> [EMAIL PROTECTED]> wrote:
>
>  Yes, IO is done inside locks to avoid multiple takes and puts getting
> written out at the same time. Even though Java makes sure the writes are
> serialized, Flume still needs to keep track of some counters etc, so the
> lock is required. Note that the lock you are talking about is  in the
> LogFile class, which represents a single file - so even if the write is
> inside that lock (which is also inside that class itself) that  does not
> cause any contention - because the lock is just preventing two IO ops to
> happen at the same time.
>
>
> Thanks,
> Hari
>
> On Tuesday, August 13, 2013 at 5:01 PM, Pankaj Gupta wrote:
>
> It seems like some i/o is done inside the lock, which means that time for
> taking a lock is proportional to the time for i/o and thus it becomes a
> problem. I apologize in advance if I am wrong but the call stack and
> behavior I'm seeing seems to suggest that. Specifically, it seems that we
> do a write while inside take:
> "SinkRunner-PollingRunner-LoadBalancingSinkProcessor" prio=10
> tid=0x00007f857338c800 nid=0x404a runnable [0x00007f821b2f1000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.NativeThread.current(Native Method)
>         at sun.nio.ch.NativeThreadSet.add(NativeThreadSet.java:27)
>         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:194)
>         - locked <0x00000005190ec998> (a java.lang.Object)
>         at
> org.apache.flume.channel.file.LogFile$Writer.write(LogFile.java:247)
>         at
> org.apache.flume.channel.file.LogFile$Writer.take(LogFile.java:212)
>         - locked <0x0000000519111590> (a
> org.apache.flume.channel.file.LogFileV3$Writer)
>         at org.apache.flume.channel.file.Log.take(Log.java:550)
>         at
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:499)
>         at
> org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
>         at
> org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
>         at
> org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:330)
>         at
> org.apache.flume.sink.LoadBalancingSinkProcessor.process(LoadBalancingSinkProcessor.java:154)
>         at
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
>
>
>
> On Tue, Aug 13, 2013 at 4:39 PM, Hari Shreedharan <
> [EMAIL PROTECTED]> wrote:
>
> Since the channel is designed to make sure that events are not duplicated
> to multiple sinks, and to protect against corruption due to concurrency
> issues, we do not need the locking in the channel's flume event queue. It
+
Pankaj Gupta 2013-08-14, 02:33
+
Brock Noland 2013-08-14, 02:41
+
Pankaj Gupta 2013-08-14, 02:46
+
Brock Noland 2013-08-14, 02:54
+
Pankaj Gupta 2013-08-14, 02:57
+
Brock Noland 2013-08-14, 03:06
+
Pankaj Gupta 2013-08-14, 03:16
+
Brock Noland 2013-08-14, 03:30
+
Pankaj Gupta 2013-08-14, 18:57
+
Pankaj Gupta 2013-08-14, 19:12
+
Pankaj Gupta 2013-08-14, 19:34
+
Hari Shreedharan 2013-08-14, 19:43
+
Pankaj Gupta 2013-08-14, 19:59
+
Pankaj Gupta 2013-08-15, 06:04
+
Hari Shreedharan 2013-08-14, 19:04
+
Pankaj Gupta 2013-08-14, 02:16