Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Flume 1.3.0 - NFS + File Channel Performance


Copy link to this message
-
Re: Flume 1.3.0 - NFS + File Channel Performance
I can confirm that this fixed issues with a non-NFS FileChannel(the
difference is still very noticeable, and I would recommend anyone with
high throughput to patch this in)

On 12/19/2012 06:08 AM, Brock Noland wrote:
> Hi,
>
> If you do have a chance, it would great to hear if the patch attached
> to this JIRA (https://issues.apache.org/jira/browse/FLUME-1794) fixes
> the performance problem.
>
> Brock
>
> On Tue, Dec 18, 2012 at 11:25 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
>> Yeah I think we should do that check in the background and then update
>> a flag. This how hdfs and mapred do it.
>>
>> On Tue, Dec 18, 2012 at 11:04 AM, Hari Shreedharan
>> <[EMAIL PROTECTED]> wrote:
>>> Yep. The disk space calls require an NFS call for each write, and that slows
>>> things down a lot.
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Tuesday, December 18, 2012 at 8:43 AM, Brock Noland wrote:
>>>
>>> We'd need those thread dumps to help confirm but I bet that FLUME-1609
>>> results in a NFS call on each operation on the channel.
>>>
>>> If that is true, that would explain why it works well on local disk.
>>>
>>> Brock
>>>
>>> On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi,
>>>
>>> Hmm, yes in general performance is not going to be great over NFS, but
>>> there haven't been any FC changes that stick out here.
>>>
>>> Could you take 10 thread dumps of the agent running the file channel
>>> and 10 thread dumps of the agent sending data to the agent with the
>>> file channel? (You can address them to myself directly since the list
>>> won't take attachements.)
>>>
>>> Are there any patterns, like it works for 40 seconds then times out
>>> and then works for 39 seconds, etc?
>>>
>>> Brock
>>>
>>> On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf
>>> <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> We�ve run into a strange problem regarding NFS and File Channel performance
>>> while evaluating the new version of Flume.
>>>
>>> We had no issues with the previous version (1.2.0).
>>>
>>>
>>>
>>> Our configuration looks like this:
>>>
>>> � Node1:
>>> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel -> Avro
>>> Sink (-> Node 2)
>>>
>>> � Node2:
>>> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>>>
>>>
>>>
>>> Both the checkpoint and the data directories of the File Channels are on NFS
>>> shares. We use the same share for checkpoint and data directories, but
>>> different shares for each Node. Unfortunately it is not an option for us to
>>> use local directories.
>>>
>>> The events are about 1KB large, and the batch sizes are the following:
>>>
>>> � Avro RPC Clients: 1000
>>>
>>> � Custom Sources: 2000
>>>
>>> � Avro Sink: 5000
>>>
>>> � Custom Sink: 10000
>>>
>>>
>>>
>>> We are experiencing very slow File Channel performance compared to the
>>> previous version, and high amount of timeouts (almost always) in the Avro
>>> RPC Clients and the Avro Sink.
>>>
>>> Something like this:
>>>
>>> � 2012-12-18 15:43:31,828
>>> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN
>>> org.apache.flume.sink.AvroSink - Failed to send event batch
>>> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***,
>>> port: *** }: Failed to send batch
>>> at
>>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
>>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>> ***
>>> at
>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>> [flume-ng-core-1.3.0.jar:1.3.0]
>>> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
>>> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
>>> host: ***, port: *** }: Handshake timed out after 20000ms
>>> at
>>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
>>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>> at
>>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
>>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB