Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> FlumeNG Performance Questions


Copy link to this message
-
Re: FlumeNG Performance Questions
Hi,

What version of NG are you running? Comment below inline.

On Tue, Nov 6, 2012 at 8:10 PM, Cameron Gandevia <[EMAIL PROTECTED]>wrote:

> Hi
>
> I am trying to transition some flume nodes running FlumeOG to FlumeNG but
> am running into a few difficulties. We are writing around 16,000 events/s
> from a bunch of FlumeOG agents to a FlumeNG agent but we can't seem to get
> the FlumeNG agent to drain the memory channel fast enough. At first I
> thought maybe we were reaching the limit of a single Flume agent but I get
> similar performance using a file channel which doesn't make sense.
>
> I have tried configuring anywhere from a single hdfs sink up to twenty of
> them, I have also tried changing the batch sizes from 1000 up to 100,000
> but no matter what I do the channel fills fairly quickly.
>
> I am running a single flow using the below configuration
>
> ${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.type = memory
> ${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.capacity = 1000000
> ${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.transactionCapacity > 100000
>
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.type > org.apache.flume.source.thriftLegacy.ThriftLegacySource
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.host = 0.0.0.0
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.port = 36892
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.channels > hdfs-memoryChannel
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.selector.type = replicating
>
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.type = hdfs
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.path > hdfs://${HADOOP_NAMENODE}:8020/rawLogs/%Y-%m-%d/%H00
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.codeC > com.hadoop.compression.lzo.LzopCodec
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.fileType = CompressedStream
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollInterval = 300
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollSize = 0
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.threadsPoolSize = 10
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollCount = 0
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.batchSize = 50000
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.callTimeout = 120000
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.filePrefix > ${FLUME_COLLECTOR_ID}_1
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.txnEventMax = 1000
>

I think this should be:

${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.txnEventMax = 50000

Spelled wrong and it should be equal to your batch size. I believe we
removed that parameter in trunk.
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.serializer = text
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.channel = hdfs-memoryChannel
>
> Thanks
>
> Cameron Gandevia
>

--
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB