Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> FlumeNG Performance Questions


Copy link to this message
-
Re: FlumeNG Performance Questions
Hi,

What version of NG are you running? Comment below inline.

On Tue, Nov 6, 2012 at 8:10 PM, Cameron Gandevia <[EMAIL PROTECTED]>wrote:

> Hi
>
> I am trying to transition some flume nodes running FlumeOG to FlumeNG but
> am running into a few difficulties. We are writing around 16,000 events/s
> from a bunch of FlumeOG agents to a FlumeNG agent but we can't seem to get
> the FlumeNG agent to drain the memory channel fast enough. At first I
> thought maybe we were reaching the limit of a single Flume agent but I get
> similar performance using a file channel which doesn't make sense.
>
> I have tried configuring anywhere from a single hdfs sink up to twenty of
> them, I have also tried changing the batch sizes from 1000 up to 100,000
> but no matter what I do the channel fills fairly quickly.
>
> I am running a single flow using the below configuration
>
> ${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.type = memory
> ${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.capacity = 1000000
> ${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.transactionCapacity > 100000
>
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.type > org.apache.flume.source.thriftLegacy.ThriftLegacySource
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.host = 0.0.0.0
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.port = 36892
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.channels > hdfs-memoryChannel
> ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.selector.type = replicating
>
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.type = hdfs
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.path > hdfs://${HADOOP_NAMENODE}:8020/rawLogs/%Y-%m-%d/%H00
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.codeC > com.hadoop.compression.lzo.LzopCodec
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.fileType = CompressedStream
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollInterval = 300
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollSize = 0
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.threadsPoolSize = 10
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollCount = 0
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.batchSize = 50000
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.callTimeout = 120000
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.filePrefix > ${FLUME_COLLECTOR_ID}_1
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.txnEventMax = 1000
>

I think this should be:

${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.txnEventMax = 50000

Spelled wrong and it should be equal to your batch size. I believe we
removed that parameter in trunk.
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.serializer = text
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.channel = hdfs-memoryChannel
>
> Thanks
>
> Cameron Gandevia
>

--
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/