Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Flume 1.3.0 - NFS + File Channel Performance


+
Rakos, Rudolf 2012-12-18, 16:07
+
Brock Noland 2012-12-18, 16:17
+
Brock Noland 2012-12-18, 16:43
+
Hari Shreedharan 2012-12-18, 17:04
+
Brock Noland 2012-12-18, 17:25
Copy link to this message
-
Re: Flume 1.3.0 - NFS + File Channel Performance
Brock Noland 2012-12-18, 21:08
Hi,

If you do have a chance, it would great to hear if the patch attached
to this JIRA (https://issues.apache.org/jira/browse/FLUME-1794) fixes
the performance problem.

Brock

On Tue, Dec 18, 2012 at 11:25 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
> Yeah I think we should do that check in the background and then update
> a flag. This how hdfs and mapred do it.
>
> On Tue, Dec 18, 2012 at 11:04 AM, Hari Shreedharan
> <[EMAIL PROTECTED]> wrote:
>> Yep. The disk space calls require an NFS call for each write, and that slows
>> things down a lot.
>>
>> --
>> Hari Shreedharan
>>
>> On Tuesday, December 18, 2012 at 8:43 AM, Brock Noland wrote:
>>
>> We'd need those thread dumps to help confirm but I bet that FLUME-1609
>> results in a NFS call on each operation on the channel.
>>
>> If that is true, that would explain why it works well on local disk.
>>
>> Brock
>>
>> On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>> Hmm, yes in general performance is not going to be great over NFS, but
>> there haven't been any FC changes that stick out here.
>>
>> Could you take 10 thread dumps of the agent running the file channel
>> and 10 thread dumps of the agent sending data to the agent with the
>> file channel? (You can address them to myself directly since the list
>> won't take attachements.)
>>
>> Are there any patterns, like it works for 40 seconds then times out
>> and then works for 39 seconds, etc?
>>
>> Brock
>>
>> On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf
>> <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>>
>>
>> We’ve run into a strange problem regarding NFS and File Channel performance
>> while evaluating the new version of Flume.
>>
>> We had no issues with the previous version (1.2.0).
>>
>>
>>
>> Our configuration looks like this:
>>
>> · Node1:
>> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel -> Avro
>> Sink (-> Node 2)
>>
>> · Node2:
>> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>>
>>
>>
>> Both the checkpoint and the data directories of the File Channels are on NFS
>> shares. We use the same share for checkpoint and data directories, but
>> different shares for each Node. Unfortunately it is not an option for us to
>> use local directories.
>>
>> The events are about 1KB large, and the batch sizes are the following:
>>
>> · Avro RPC Clients: 1000
>>
>> · Custom Sources: 2000
>>
>> · Avro Sink: 5000
>>
>> · Custom Sink: 10000
>>
>>
>>
>> We are experiencing very slow File Channel performance compared to the
>> previous version, and high amount of timeouts (almost always) in the Avro
>> RPC Clients and the Avro Sink.
>>
>> Something like this:
>>
>> · 2012-12-18 15:43:31,828
>> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN
>> org.apache.flume.sink.AvroSink - Failed to send event batch
>> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***,
>> port: *** }: Failed to send batch
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> ***
>> at
>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>> [flume-ng-core-1.3.0.jar:1.3.0]
>> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
>> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
>> host: ***, port: *** }: Handshake timed out after 20000ms
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> ... 5 common frames omitted
>> Caused by: java.util.concurrent.TimeoutException: null
>> at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
>> ~[na:1.6.0_31]
>> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>> ~[na:1.6.0_31]
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)

Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
+
Rakos, Rudolf 2012-12-19, 08:10
+
Rakos, Rudolf 2012-12-19, 10:34
+
Brock Noland 2012-12-19, 16:08
+
Juhani Connolly 2012-12-19, 09:20