Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Flume 1.3.0 - NFS + File Channel Performance


+
Rakos, Rudolf 2012-12-18, 16:07
Copy link to this message
-
Re: Flume 1.3.0 - NFS + File Channel Performance
Hi,

Hmm, yes in general performance is not going to be great over NFS, but
there haven't been any FC changes that stick out here.

Could you take 10 thread dumps of the agent running the file channel
and 10 thread dumps of the agent sending data to the agent with the
file channel? (You can address them to myself directly since the list
won't take attachements.)

Are there any patterns, like it works for 40 seconds then times out
and then works for 39 seconds, etc?

Brock

On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf
<[EMAIL PROTECTED]> wrote:
> Hi,
>
>
>
> We’ve run into a strange problem regarding NFS and File Channel performance
> while evaluating the new version of Flume.
>
> We had no issues with the previous version (1.2.0).
>
>
>
> Our configuration looks like this:
>
> ·         Node1:
> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel -> Avro
> Sink (-> Node 2)
>
> ·         Node2:
> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>
>
>
> Both the checkpoint and the data directories of the File Channels are on NFS
> shares. We use the same share for checkpoint and data directories, but
> different shares for each Node. Unfortunately it is not an option for us to
> use local directories.
>
> The events are about 1KB large, and the batch sizes are the following:
>
> ·         Avro RPC Clients: 1000
>
> ·         Custom Sources: 2000
>
> ·         Avro Sink: 5000
>
> ·         Custom Sink: 10000
>
>
>
> We are experiencing very slow File Channel performance compared to the
> previous version, and high amount of timeouts (almost always) in the Avro
> RPC Clients and the Avro Sink.
>
> Something like this:
>
> ·         2012-12-18 15:43:31,828
> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN
> org.apache.flume.sink.AvroSink - Failed to send event batch
> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***,
> port: *** }: Failed to send batch
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>         ***
>         at
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> [flume-ng-core-1.3.0.jar:1.3.0]
>         at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
> host: ***, port: *** }: Handshake timed out after 20000ms
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>         ... 5 common frames omitted
> Caused by: java.util.concurrent.TimeoutException: null
>         at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
> ~[na:1.6.0_31]
>         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
> ~[na:1.6.0_31]
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>         ... 6 common frames omitted
>
> (I had to remove some details, sorry for that.)
>
>
>
> We managed to narrow down the root cause of the issue to the File Channel,
> because:
>
> ·         Everything works fine if we switch to the Memory Channel or to the
> Old File Channel (1.2.0).
>
> ·         Everything works fine if we use local directories.
>
> We’ve tested this on multiple different PCs (both Windows and Linux).
>
>
>
> I spent the day debugging and profiling, but I could not find anything worth
> mentioning (nothing with excessive CPU usage, no threads are waiting too
> much, etc…). The only problem is that File Channel takes and puts take way
> more time than with the previous version.
>
>
>
>
>
> Could someone please try the File Channel on an NFS share?
>
> Does anyone have similar issues?
>
>
>
> Thank you for your help.
>
>
>
> Regards,
>
> Rudolf
>
>
>
> Rudolf Rakos
> Morgan Stanley | ISG Technology

Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
+
Brock Noland 2012-12-18, 16:43
+
Hari Shreedharan 2012-12-18, 17:04
+
Brock Noland 2012-12-18, 17:25
+
Brock Noland 2012-12-18, 21:08
+
Rakos, Rudolf 2012-12-19, 08:10
+
Rakos, Rudolf 2012-12-19, 10:34
+
Brock Noland 2012-12-19, 16:08
+
Juhani Connolly 2012-12-19, 09:20