Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> understanding flume performance


Copy link to this message
-
Re: understanding flume performance
hi Raymond,
     You said correctly. FileChannel is bottleneck with lower throughput in
my performance report too. The transaction model in Flume can tell us in
fact : event reaches to next hop channel regularly, then it can be removed
from current Agent. Thus, transaction bottleneck in Agent2 limited
consuming speed in Agent1.

    I took some comments in your original mail, wish your attention.

    I'm going on tuning in FileChannel, and making increasing throughput
already. Those tuning points I will submit to JIRA later.

-Regards
Denny Ye

2012/7/31 Raymond Ng <[EMAIL PROTECTED]>

> good day all, sorry for the long email
>
> I'd like to know how to gauge where the performance bottleneck is with
> different types of channels used
>
> I have a demo environemnt which looks a bit like this
>
> Setup 1
>
> Agent 1 ( Exec Source, Memory Channel and Avro Sink with 1 GB JVM)
> streaming data to
> Agent 2 ( Avro Source, Memory Channel and HDFS Sink with 1.5 GB JVM)
>
> the memory channel both have 1,000,000 capacity and 10,000 transaction
> capacity and I managed to achieve ~8000 records/sec in the Exec Source
> of Agent 1, and I'm not too concerned with how long it takes for Agent 2 to
> insert into HDFS
>
> and when I changed Agent 2 to use FileChannel
>
> Setup 2
>
>  Agent 1 ( Exec Source, Memory Channel and Avro Sink with 2 GB JVM)
> streaming data to
> Agent 2 ( Avro Source, File Channel and HDFS Sink with 1.0 GB JVM),  the
> File Channel has the same capacity and transaction capacity as the memory
> channel stated above
>
> I've doubled the JVM for Agent 1 knowing that it needs to have a bigger
> buffer to handle the same throughout from the Exec source, as Agent 2 will
> be slower buffering records to disk before writing to HDFS.
>
> now I achieved ~4000 records per second in Exce source of Agent 1, however
> I wasn't expecting the Exec source to slow down on the throughput as
> its getting the same input from tailing the same file
>
> Is the decrease in the source throughput in Agent 1 to do with Agent 2
> taking much longer to commit the events into the file channel which causes
> a knock-on on Agent 1 to release the records from its memory
> channel?[Denny] The answer is Yes
>
> I thought the performance on the source is determined by how quickly it
> can commit the events to the channel, the fact that the sink can't
> consume the events as quick as they are put in by the source should not
> affect the speed the source is committing to the channel?[Denny] Events
> have accumulated at channel, it may impact the put transaction from Source.
> Reason can be represented with 'No space left for new coming events'
>
>  I say this because I have come across ChannelException where it suggested
> the sinks are not keeping up with the sources, kind of suggests to me that
> the sink will not slow down the source in terms of channel commit
>
> hope it makes sense
>
> thanks for any advice
> --
> Rgds
> Ray
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB