Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Lock contention in FileChannel


Copy link to this message
-
Re: Lock contention in FileChannel
Just to provide an update here. With sink groups removed and using two
dataDirs per disk, total of 32 avro sinks, I am able to get reasonable
throughput of up to 100 Mbps. @brock thanks for your suggestion about using
multiple dataDirs, even on the same disk, per channel. Seems that we were
not i/o bound but concurrency bound. To get concrete data I added some
logging to AvroRpcSink:
diff --git
a/flume-ng-core/src/main/java/org/apache/flume/sink/AbstractRpcSink.java
b/flume-ng-core/src/main/java/org/apache/flume/sink/AbstractRpcSink.java
index 892c949..68c43ef 100644
--- a/flume-ng-core/src/main/java/org/apache/flume/sink/AbstractRpcSink.java
+++ b/flume-ng-core/src/main/java/org/apache/flume/sink/AbstractRpcSink.java
@@ -315,19 +315,24 @@ public abstract class AbstractRpcSink extends
AbstractSink

   @Override
   public Status process() throws EventDeliveryException {
+    long startTime = System.currentTimeMillis();
     Status status = Status.READY;
     Channel channel = getChannel();
     Transaction transaction = channel.getTransaction();
-
     try {
+      // Measure time spent in this block
       transaction.begin();

       verifyConnection();

       List<Event> batch = Lists.newLinkedList();
+      long takeTimeSum = 0;

       for (int i = 0; i < client.getBatchSize(); i++) {
+
+        long takeStart = System.currentTimeMillis();
         Event event = channel.take();
+        takeTimeSum += (System.currentTimeMillis() - takeStart);

         if (event == null) {
           break;
@@ -349,10 +354,26 @@ public abstract class AbstractRpcSink extends
AbstractSink
           sinkCounter.incrementBatchCompleteCount();
         }
         sinkCounter.addToEventDrainAttemptCount(size);
+
+        long appendBatchStart = System.currentTimeMillis();
         client.appendBatch(batch);
+        long appendBatchTime = System.currentTimeMillis() -
appendBatchStart;
+        if (size > 0) {
+          logger.debug("<append> time for batch-size: " + size + ": " +
appendBatchTime + "ms");
+          logger.debug("<append> rate: " + (appendBatchTime * 1000)/ size);
+        }
       }

       transaction.commit();
+
+      if (size > 0) {
+        long processTime = System.currentTimeMillis() - startTime;
+        logger.debug("<process> time for batch-size: " + size + ": " +
processTime + "ms");
+        logger.debug("<process> rate: " + (processTime * 1000)/ size);
+        logger.debug("<takes> time for batch-size: " + size + ": " +
takeTimeSum + "ms");
+        logger.debug("<take> rate: " + (takeTimeSum * 1000)/ size);
+      }
+
       sinkCounter.addToEventDrainSuccessCount(size);

     } catch (Throwable t) {

Here are some numbers for the time taken per 1000 takes:
<take> rate: 842
<take> rate: 800
<take> rate: 540
<take> rate: 642
<take> rate: 624
<take> rate: 627
<take> rate: 1011
<take> rate: 1632
<take> rate: 1097
<take> rate: 1542
<take> rate: 1650
<take> rate: 375
<take> rate: 365
<take> rate: 628
<take> rate: 127
<take> rate: 1833
<take> rate: 616
<take> rate: 2169
<take> rate: 1120
<take> rate: 625
<take> rate: 529
<take> rate: 152
<take> rate: 649
<take> rate: 137
<take> rate: 147
<take> rate: 138
<take> rate: 67
<take> rate: 820
<take> rate: 765
<take> rate: 920
<take> rate: 913
<take> rate: 410
<take> rate: 700
<take> rate: 500
<take> rate: 41
<take> rate: 644
<take> rate: 383
<take> rate: 1025
<take> rate: 1366

Here are corresponding number for time taken(ms) per 1000 appends
<append> rate: 171
<append> rate: 276
<append> rate: 491
<append> rate: 271
<append> rate: 327
<append> rate: 100
<append> rate: 117
<append> rate: 153
<append> rate: 211
<append> rate: 452
<append> rate: 433
<append> rate: 392
<append> rate: 482
<append> rate: 557
<append> rate: 500
<append> rate: 179
<append> rate: 192
<append> rate: 220
<append> rate: 174
<append> rate: 175
<append> rate: 171
<append> rate: 171
<append> rate: 268
<append> rate: 55
<append> rate: 325
<append> rate: 326
<append> rate: 56
<append> rate: 833
<append> rate: 1666
<append> rate: 142
<append> rate: 192
<append> rate: 172
<append> rate: 165
<append> rate: 130
<append> rate: 283
<append> rate: 68
<append> rate: 385
<append> rate: 210
<append> rate: 549
<append> rate: 317

(sorry about using the word rate here, it's actually the opposite: time per
1000 actions, smaller is better, I realized it a little late.)

With this logging in place I saw numbers of ~1000ms per 1000 takes vs
~300ms per 1000 appends. Please note that append sends events over the wire
and waits for an ok. This being faster than takes clearly indicated
problems with takes. By increasing the dataDirs to 2, both on the same
disk, the take numbers dropped by half and more. Here are the corresponding
take numbers by using two dataDirs per disk instead of one:
<take> rate: 51
<take> rate: 36
<take> rate: 27
<take> rate: 53
<take> rate: 44
<take> rate: 80
<take> rate: 272
<take> rate: 62
<take> rate: 91
<take> rate: 75
<take> rate: 118
<take> rate: 87
<take> rate: 31
<take> rate: 46
<take> rate: 72
<take> rate: 260
<take> rate: 339
<take> rate: 332
<take> rate: 280
<take> rate: 100
<take> rate: 113
<take> rate: 63
<take> rate: 277
<take> rate: 364
<take> rate: 278
<take> rate: 120
<take> rate: 214
<take> rate: 290
<take> rate: 39
<take> rate: 76
<take> rate: 61
<take> rate: 96
<take> rate: 129
<take> rate: 153
<take> rate: 85
<take> rate: 61
<take> rate: 47
<take> rate: 141
<take> rate: 350
<take> rate: 308
<take> rate: 157

Please also note that the takes above don't even involve a commit, so no
flush to disk. I strongly suspect that the bottleneck here is not just disk
i/o but also locks. I also saw in code that both put and take in logFile
use the same lock. So the contention is not just between sinks but source
and the sink. I don't know the details of the design so not sure why there
is a requirement of taking the same lock for reads and writes but if they
could be different then that should help with performance.

100Mbps is much better than 25Mbps, th