Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Flume 1.3.0 - NFS + File Channel Performance


Copy link to this message
-
RE: Flume 1.3.0 - NFS + File Channel Performance
Brock, Hari,

I can confirm that the patch in FLUME-1794 fixes the performance issue.

I was wondering whether it is possible to ask for a new release (1.3.1) including the recent File Channel bug fixes?

  Trunk: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=history;f=flume-ng-channels/flume-file-channel;h=cc779e886b4d6290723a43b4f874239150d93475;hb=trunk
  1.3.0: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=history;f=flume-ng-channels/flume-file-channel;h=cc93d99eac6d631e9200d122928d5e307621b4fe;hb=refs/heads/flume-1.3.0

Unfortunately we cannot use trunk, and waiting for Flume 1.4.0 could take a few months.
It's not a big problem if we need to stick with Flume 1.2.0, but according to Juhani Connolly this was causing high CPU usage with non-NFS File Channels too, so I think maybe it would be better for the community.

Regards,
Rudolf

-----Original Message-----
From: Rakos, Rudolf (ISGT)
Sent: Wednesday, December 19, 2012 9:10 AM
To: [EMAIL PROTECTED]
Subject: RE: Flume 1.3.0 - NFS + File Channel Performance

Brock, Hari,

Thank you very much for looking so quickly into this.

We're aware that the general performance will not be that great using NFS, but having some "last minute" data on failover scenarios could be worth the performance cost.

You were right.
I've taken some thread dumps and I can confirm that FLUME-1609 (File.getUsableSpace calls) are causing the issue. (I just don't understand how could I miss this hot spot during profiling.)

I'll check whether the patch in FLUME-1794 fixes this.

Thanks,
Rudolf

-----Original Message-----
From: Brock Noland [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, December 18, 2012 10:09 PM
To: [EMAIL PROTECTED]
Subject: Re: Flume 1.3.0 - NFS + File Channel Performance

Hi,

If you do have a chance, it would great to hear if the patch attached to this JIRA (https://issues.apache.org/jira/browse/FLUME-1794) fixes the performance problem.

Brock

On Tue, Dec 18, 2012 at 11:25 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
> Yeah I think we should do that check in the background and then update
> a flag. This how hdfs and mapred do it.
>
> On Tue, Dec 18, 2012 at 11:04 AM, Hari Shreedharan
> <[EMAIL PROTECTED]> wrote:
>> Yep. The disk space calls require an NFS call for each write, and
>> that slows things down a lot.
>>
>> --
>> Hari Shreedharan
>>
>> On Tuesday, December 18, 2012 at 8:43 AM, Brock Noland wrote:
>>
>> We'd need those thread dumps to help confirm but I bet that
>> FLUME-1609 results in a NFS call on each operation on the channel.
>>
>> If that is true, that would explain why it works well on local disk.
>>
>> Brock
>>
>> On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>> Hmm, yes in general performance is not going to be great over NFS,
>> but there haven't been any FC changes that stick out here.
>>
>> Could you take 10 thread dumps of the agent running the file channel
>> and 10 thread dumps of the agent sending data to the agent with the
>> file channel? (You can address them to myself directly since the list
>> won't take attachements.)
>>
>> Are there any patterns, like it works for 40 seconds then times out
>> and then works for 39 seconds, etc?
>>
>> Brock
>>
>> On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf
>> <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>>
>>
>> We’ve run into a strange problem regarding NFS and File Channel
>> performance while evaluating the new version of Flume.
>>
>> We had no issues with the previous version (1.2.0).
>>
>>
>>
>> Our configuration looks like this:
>>
>> · Node1:
>> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel
>> -> Avro Sink (-> Node 2)
>>
>> · Node2:
>> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>>
>>
>>
>> Both the checkpoint and the data directories of the File Channels are
>> on NFS shares. We use the same share for checkpoint and data
>> directories, but different shares for each Node. Unfortunately it is

Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB