Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> FlumeNG Performance Questions


Copy link to this message
-
Re: FlumeNG Performance Questions
Hi Cameron,

It seems like you are somehow hitting performance issues with the HDFS cluster. HDFS Sink does perform pretty well usually - so as an experiment, can you try (if you have access to that is),

a) try running multiple flume agents with the same configuration on different physical machines (with say 3 sinks each or wherever you hit the limit)
b) shutdown one of the agents and increase the number of sinks on the other agent (maybe with double the sinks or wherever the drain rate hits the limit).

Once you do that, can you let me know your findings. I am just trying to figure out if you are hitting some limit on a single agent.
Thanks,
Hari

--
Hari Shreedharan
On Wednesday, November 7, 2012 at 9:37 AM, Brock Noland wrote:

> Hi,
>
> What version of NG are you running? Comment below inline.
> On Tue, Nov 6, 2012 at 8:10 PM, Cameron Gandevia <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > Hi
> >
> > I am trying to transition some flume nodes running FlumeOG to FlumeNG but am running into a few difficulties. We are writing around 16,000 events/s from a bunch of FlumeOG agents to a FlumeNG agent but we can't seem to get the FlumeNG agent to drain the memory channel fast enough. At first I thought maybe we were reaching the limit of a single Flume agent but I get similar performance using a file channel which doesn't make sense.
> >
> > I have tried configuring anywhere from a single hdfs sink up to twenty of them, I have also tried changing the batch sizes from 1000 up to 100,000 but no matter what I do the channel fills fairly quickly.  
> >
> > I am running a single flow using the below configuration
> >
> > ${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.type = memory
> > ${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.capacity = 1000000
> > ${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.transactionCapacity = 100000
> >
> >
> > ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.type = org.apache.flume.source.thriftLegacy.ThriftLegacySource
> > ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.host = 0.0.0.0
> > ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.port = 36892
> > ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.channels = hdfs-memoryChannel
> > ${FLUME_COLLECTOR_ID}.sources.perf_legacysource.selector.type = replicating
> >
> >
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.type = hdfs
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.path = hdfs://${HADOOP_NAMENODE}:8020/rawLogs/%Y-%m-%d/%H00
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.codeC = com.hadoop.compression.lzo.LzopCodec
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.fileType = CompressedStream
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollInterval = 300
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollSize = 0
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.threadsPoolSize = 10
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollCount = 0
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.batchSize = 50000
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.callTimeout = 120000
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.filePrefix = ${FLUME_COLLECTOR_ID}_1
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.txnEventMax = 1000
> >
> >
>
>
> I think this should be:
>
> ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.txnEventMax = 50000
>
> Spelled wrong and it should be equal to your batch size. I believe we removed that parameter in trunk.
>  
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.serializer = text
> > ${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.channel = hdfs-memoryChannel
> >
> >
> > Thanks
> >
> > Cameron Gandevia
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/