Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Flume 1.3.0 - NFS + File Channel Performance


Copy link to this message
-
Re: Flume 1.3.0 - NFS + File Channel Performance
Yeah I think we should do that check in the background and then update
a flag. This how hdfs and mapred do it.

On Tue, Dec 18, 2012 at 11:04 AM, Hari Shreedharan
<[EMAIL PROTECTED]> wrote:
> Yep. The disk space calls require an NFS call for each write, and that slows
> things down a lot.
>
> --
> Hari Shreedharan
>
> On Tuesday, December 18, 2012 at 8:43 AM, Brock Noland wrote:
>
> We'd need those thread dumps to help confirm but I bet that FLUME-1609
> results in a NFS call on each operation on the channel.
>
> If that is true, that would explain why it works well on local disk.
>
> Brock
>
> On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> Hmm, yes in general performance is not going to be great over NFS, but
> there haven't been any FC changes that stick out here.
>
> Could you take 10 thread dumps of the agent running the file channel
> and 10 thread dumps of the agent sending data to the agent with the
> file channel? (You can address them to myself directly since the list
> won't take attachements.)
>
> Are there any patterns, like it works for 40 seconds then times out
> and then works for 39 seconds, etc?
>
> Brock
>
> On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf
> <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
>
>
> We’ve run into a strange problem regarding NFS and File Channel performance
> while evaluating the new version of Flume.
>
> We had no issues with the previous version (1.2.0).
>
>
>
> Our configuration looks like this:
>
> · Node1:
> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel -> Avro
> Sink (-> Node 2)
>
> · Node2:
> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>
>
>
> Both the checkpoint and the data directories of the File Channels are on NFS
> shares. We use the same share for checkpoint and data directories, but
> different shares for each Node. Unfortunately it is not an option for us to
> use local directories.
>
> The events are about 1KB large, and the batch sizes are the following:
>
> · Avro RPC Clients: 1000
>
> · Custom Sources: 2000
>
> · Avro Sink: 5000
>
> · Custom Sink: 10000
>
>
>
> We are experiencing very slow File Channel performance compared to the
> previous version, and high amount of timeouts (almost always) in the Avro
> RPC Clients and the Avro Sink.
>
> Something like this:
>
> · 2012-12-18 15:43:31,828
> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN
> org.apache.flume.sink.AvroSink - Failed to send event batch
> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***,
> port: *** }: Failed to send batch
> at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
> ***
> at
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> [flume-ng-core-1.3.0.jar:1.3.0]
> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
> host: ***, port: *** }: Handshake timed out after 20000ms
> at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
> at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
> ... 5 common frames omitted
> Caused by: java.util.concurrent.TimeoutException: null
> at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
> ~[na:1.6.0_31]
> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
> ~[na:1.6.0_31]
> at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
> ... 6 common frames omitted
>
> (I had to remove some details, sorry for that.)
>
>
>
> We managed to narrow down the root cause of the issue to the File Channel,
> because:
>
> · Everything works fine if we switch to the Memory Channel or to the
> Old File Channel (1.2.0).
>
> · Everything works fine if we use local directories.

Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/