Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> FlumeNG Performance Questions

Copy link to this message
FlumeNG Performance Questions

I am trying to transition some flume nodes running FlumeOG to FlumeNG but
am running into a few difficulties. We are writing around 16,000 events/s
from a bunch of FlumeOG agents to a FlumeNG agent but we can't seem to get
the FlumeNG agent to drain the memory channel fast enough. At first I
thought maybe we were reaching the limit of a single Flume agent but I get
similar performance using a file channel which doesn't make sense.

I have tried configuring anywhere from a single hdfs sink up to twenty of
them, I have also tried changing the batch sizes from 1000 up to 100,000
but no matter what I do the channel fills fairly quickly.

I am running a single flow using the below configuration

${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.type = memory
${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.capacity = 1000000
${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.transactionCapacity 100000

${FLUME_COLLECTOR_ID}.sources.perf_legacysource.type org.apache.flume.source.thriftLegacy.ThriftLegacySource
${FLUME_COLLECTOR_ID}.sources.perf_legacysource.host =
${FLUME_COLLECTOR_ID}.sources.perf_legacysource.port = 36892
${FLUME_COLLECTOR_ID}.sources.perf_legacysource.channels hdfs-memoryChannel
${FLUME_COLLECTOR_ID}.sources.perf_legacysource.selector.type = replicating

${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.type = hdfs
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.path hdfs://${HADOOP_NAMENODE}:8020/rawLogs/%Y-%m-%d/%H00
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.codeC com.hadoop.compression.lzo.LzopCodec
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.fileType = CompressedStream
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollInterval = 300
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollSize = 0
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.threadsPoolSize = 10
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollCount = 0
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.batchSize = 50000
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.callTimeout = 120000
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.filePrefix ${FLUME_COLLECTOR_ID}_1
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.txnEventMax = 1000
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.serializer = text
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.channel = hdfs-memoryChannel


Cameron Gandevia