Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> FlumeNG Performance Questions


Copy link to this message
-
FlumeNG Performance Questions
Hi

I am trying to transition some flume nodes running FlumeOG to FlumeNG but
am running into a few difficulties. We are writing around 16,000 events/s
from a bunch of FlumeOG agents to a FlumeNG agent but we can't seem to get
the FlumeNG agent to drain the memory channel fast enough. At first I
thought maybe we were reaching the limit of a single Flume agent but I get
similar performance using a file channel which doesn't make sense.

I have tried configuring anywhere from a single hdfs sink up to twenty of
them, I have also tried changing the batch sizes from 1000 up to 100,000
but no matter what I do the channel fills fairly quickly.

I am running a single flow using the below configuration

${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.type = memory
${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.capacity = 1000000
${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.transactionCapacity 100000

${FLUME_COLLECTOR_ID}.sources.perf_legacysource.type org.apache.flume.source.thriftLegacy.ThriftLegacySource
${FLUME_COLLECTOR_ID}.sources.perf_legacysource.host = 0.0.0.0
${FLUME_COLLECTOR_ID}.sources.perf_legacysource.port = 36892
${FLUME_COLLECTOR_ID}.sources.perf_legacysource.channels hdfs-memoryChannel
${FLUME_COLLECTOR_ID}.sources.perf_legacysource.selector.type = replicating

${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.type = hdfs
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.path hdfs://${HADOOP_NAMENODE}:8020/rawLogs/%Y-%m-%d/%H00
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.codeC com.hadoop.compression.lzo.LzopCodec
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.fileType = CompressedStream
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollInterval = 300
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollSize = 0
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.threadsPoolSize = 10
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollCount = 0
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.batchSize = 50000
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.callTimeout = 120000
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.filePrefix ${FLUME_COLLECTOR_ID}_1
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.txnEventMax = 1000
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.serializer = text
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.channel = hdfs-memoryChannel

Thanks

Cameron Gandevia
+
Brock Noland 2012-11-07, 17:37
+
Hari Shreedharan 2012-11-07, 17:54
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB