Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Best way to increase throughput of Exec->Memory->Avro agent.


Copy link to this message
-
Best way to increase throughput of Exec->Memory->Avro agent.
Hi all.

I've been working on this for quite some time, and need some advice from
the experts.  I have a two tiered Flume architecture:

App Tier (all on one server):
 124 ExecSources -> MemoryChannel -> AvroSinks

HDFS Tier (on two servers):
  AvroSource -> FileChannel -> HDFSSinks

When I run the agents, the HDFS tier is keeping up fine with the App Tier.
 queue sizes stay between 0-10000 (I have a batch size of 10000).  All is
good.

On the App Tier, when I view the JMX data through jconsole, I watch the
size of the MemoryChannel grow steadily until it reaches the max, then it
starts throwing exceptions about not being able to put the batch on the
channel as expected.

There seems to be two basic ways to increase the throughput of the App Tier:
1.  Increase the MemoryChannel's transactionCapacity and the corresponding
AvroSink's batch-size.  Both are set to 10000 for me.
2.  Increase the number of AvroSinks to drain the MemoryChannel.  I'm up to
64 Sinks now which round-robin between the two Flume Agents on the HDFS
tier.

Both of those values seem quite high to me (batch size and number of
sinks).

Am I missing something as far as tuning?
Which would allow for greater increase to throughput, more Sinks or larger
batch size?

I'm stumped here.  I still think I can get this to work. :)

Any suggestions are most welcome.
Thanks for your time.
Chris
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB