Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - understanding flume performance


Copy link to this message
-
Re: understanding flume performance
Denny Ye 2012-07-31, 17:27
hi Raymond,
     You said correctly. FileChannel is bottleneck with lower throughput in
my performance report too. The transaction model in Flume can tell us in
fact : event reaches to next hop channel regularly, then it can be removed
from current Agent. Thus, transaction bottleneck in Agent2 limited
consuming speed in Agent1.

    I took some comments in your original mail, wish your attention.

    I'm going on tuning in FileChannel, and making increasing throughput
already. Those tuning points I will submit to JIRA later.

-Regards
Denny Ye

2012/7/31 Raymond Ng <[EMAIL PROTECTED]>

> good day all, sorry for the long email
>
> I'd like to know how to gauge where the performance bottleneck is with
> different types of channels used
>
> I have a demo environemnt which looks a bit like this
>
> Setup 1
>
> Agent 1 ( Exec Source, Memory Channel and Avro Sink with 1 GB JVM)
> streaming data to
> Agent 2 ( Avro Source, Memory Channel and HDFS Sink with 1.5 GB JVM)
>
> the memory channel both have 1,000,000 capacity and 10,000 transaction
> capacity and I managed to achieve ~8000 records/sec in the Exec Source
> of Agent 1, and I'm not too concerned with how long it takes for Agent 2 to
> insert into HDFS
>
> and when I changed Agent 2 to use FileChannel
>
> Setup 2
>
>  Agent 1 ( Exec Source, Memory Channel and Avro Sink with 2 GB JVM)
> streaming data to
> Agent 2 ( Avro Source, File Channel and HDFS Sink with 1.0 GB JVM),  the
> File Channel has the same capacity and transaction capacity as the memory
> channel stated above
>
> I've doubled the JVM for Agent 1 knowing that it needs to have a bigger
> buffer to handle the same throughout from the Exec source, as Agent 2 will
> be slower buffering records to disk before writing to HDFS.
>
> now I achieved ~4000 records per second in Exce source of Agent 1, however
> I wasn't expecting the Exec source to slow down on the throughput as
> its getting the same input from tailing the same file
>
> Is the decrease in the source throughput in Agent 1 to do with Agent 2
> taking much longer to commit the events into the file channel which causes
> a knock-on on Agent 1 to release the records from its memory
> channel?[Denny] The answer is Yes
>
> I thought the performance on the source is determined by how quickly it
> can commit the events to the channel, the fact that the sink can't
> consume the events as quick as they are put in by the source should not
> affect the speed the source is committing to the channel?[Denny] Events
> have accumulated at channel, it may impact the put transaction from Source.
> Reason can be represented with 'No space left for new coming events'
>
>  I say this because I have come across ChannelException where it suggested
> the sinks are not keeping up with the sources, kind of suggests to me that
> the sink will not slow down the source in terms of channel commit
>
> hope it makes sense
>
> thanks for any advice
> --
> Rgds
> Ray
>