Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> performance on RecoverableMemoryChannel vs JdbcChannel


Copy link to this message
-
Re: performance on RecoverableMemoryChannel vs JdbcChannel
It's the SyslogSource... Since it's an event driven source, it just
sends single Events in commits.

Raymond: if possible, try using a source where batching of events is
possible. We're going to need to figure out some way to make this
possible for event driven sources, but at the moment this isn't the case
unfortunately.

On 07/13/2012 12:46 AM, Brock Noland wrote:
> Hi,
>
> I would use FileChannel as opposed to RecoverableMemoryChannel.
>
> Also, it sounds like your not batching somewhere since with batching
> you will see a disk seek per event. 1000 ms / 100 events = 10 ms
> (about a disk seek).
>
> Brock
>
> On Thu, Jul 12, 2012 at 3:55 PM, Raymond Ng <[EMAIL PROTECTED]> wrote:
>> Hi
>>
>> I'm trying to investigate whether I can use flume for streaming syslog data
>> on a production environemnt, and investigating which channel will give me
>> durability and also performance
>>
>> I've tested using memory channel and the performance is good (i.e. with a
>> 1GB JVM, achieving 9000 events / sec, with 1 agent with a syslog source
>> hopping to another agent which has a hdfs sink)
>>
>> however durability and recoverability are also important when it comes to
>> production solution, and it seems both Jdbc and RecoverableMemory channels
>> offer significantly slow performance (no more than 100 events / sec).  Also
>> RecoverableMemory channel doesn't seem to resume the streaming after the
>> agents were restarted
>>
>> below is my agent configs, could you advice how I can improve the
>> performance for both jdbc and recoverableMemoery channels, is it possible to
>> config it to achieve half the performance figure that the memory channel can
>> achieve?
>>
>> Agent with Syslog source
>>
>> agent.sources = SysLogSrc
>> #agent.channels = MemChannel
>> #agent.channels = JdbcChannel
>> agent.channels = RecovMemChannel
>> agent.sinks = AvroSink
>>
>> # SysLogSrc
>> agent.sources.SysLogSrc.type = syslogtcp
>> agent.sources.SysLogSrc.host = localhost
>> agent.sources.SysLogSrc.port = 10902
>> #agent.sources.SysLogSrc.channels = MemChannel
>> #agent.sources.SysLogSrc.channels = JdbcChannel
>> agent.sources.SysLogSrc.channels = RecovMemChannel
>> # MemChannel
>> agent.channels.MemChannel.type = memory
>> agent.channels.MemChannel.capacity = 1000000
>> agent.channels.MemChannel.transactionCapacity = 10000
>> agent.channels.MemChannel.keep-alive = 3
>> # JdbcChannel
>> agent.channels.JdbcChannel.type = jdbc
>> agent.channels.JdbcChannel.db.type = DERBY
>> agent.channels.JdbcChannel.driver.class >> org.apache.derby.jdbc.EmbeddedDriver
>> agent.channels.JdbcChannel.create.schema = true
>> agent.channels.JdbcChannel.create.index = true
>> agent.channels.JdbcChannel.create.foreignkey = true
>> agent.channels.JdbcChannel.maximum.connections = 10
>> agent.channels.JdbcChannel.maximum.capacity = 0
>> agent.channels.JdbcChannel.sysprop.user.home = /flume/data
>> # RecovMemChannel
>> agent.channels.RecovMemChannel.type >> org.apache.flume.channel.recoverable.memory.RecoverableMemoryChannel
>> agent.channels.RecovMemChannel.wal.dataDir >> /flume/recoverable-memory-channel
>> agent.channels.RecovMemChannel.wal.rollSize = 104857600
>> agent.channels.RecovMemChannel.wal.minRetentionPeriod = 3600000
>> agent.channels.RecovMemChannel.wal.workerInterval = 5000
>> agent.channels.RecovMemChannel.wal.maxLogsSize = 1073741824
>> agent.channels.RecovMemChannel.capacity = 1000000
>> agent.channels.RecovMemChannel.transactionCapacity = 10000
>> agent.channels.RecovMemChannel.keep-alive = 3
>>
>> # AvroSink
>> agent.sinks.AvroSink.type = avro
>> agent.sinks.AvroSink.hostname = 192.168.200.170
>> agent.sinks.AvroSink.port = 10900
>> agent.sinks.AvroSink.batch-size = 10000
>> #agent.sinks.AvroSink.channel = JdbcChannel
>> #agent.sinks.AvroSink.channel = MemChannel
>> agent.sinks.AvroSink.channel = RecovMemChannel
>>
>>
>> Agent with HDFS sink
>>
>> agent.sources = AvroSrc
>> #agent.channels = MemChannel
>> #agent.channels = JdbcChannel
>> agent.channels = RecovMemChannel
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB