Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Low throughput of FileChannel


Copy link to this message
-
Re: Low throughput of FileChannel
hi Hari,
    Mostly channels in my production environment will be configured with
FileChannel. It may impact our platform performance. Also I'm not sure if
anyone already have got better throughput. If anyone have similar result
with me, I'd like to post my code changes to discuss.

-Regards
Denny Ye

2012/8/3 Hari Shreedharan <[EMAIL PROTECTED]>

> Denny,
>
> I am not sure if anyone has actually benchmarked the FileChannel. What
> kind of performance are you getting as of now? If you have a patch that can
> improve the performance a lot, please feel free to submit it. We'd
> definitely like to get such a patch committed.
>
> Thanks
> Hari
>
> --
> Hari Shreedharan
>
>
> On Thursday, August 2, 2012 at 8:02 PM, Denny Ye wrote:
>
> > hi all,
> >     I posted performance of MemoryChannel last week. That's normal
> throughput in most environment. Therefore, the performance result of
> FileChannel is below expectation with same environments and parameters,
> almost 5MB/s.
> >
> >     I want to know your throughput result of FileChannel specially. Am I
> walking with wrong way? It's hard to believe the result.
> >
> >    Also I have tuning with several code changes, the throughput
> increasing to 30MB/s. I think there also have lots of points to impact the
> performance.
> >
> >     Any guys, would you give me your throughput result or feedback for
> tuning?
> >
> > -Regards
> > Denny Ye
> >
> >
> > ---------- Forwarded message ----------
> > From: Denny Ye <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>
> > Date: 2012/7/25
> > Subject: Latest Flume test report and problem
> > To: [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])
> >
> >
> > hi all,
> >    I tested Flume in last week with ScribeSource(
> https://issues.apache.org/jira/browse/FLUME-1382) and HDFS Sink. More
> detailed conditions and deployment cases listed below. Too many 'Full GC'
> impact the throughput and amount of events promoted into old generation. I
> have applied some tuning methods, no much effect.
> >    Could someone give me your feedback or tip to reduce the GC problem?
> Wish your attention.
> >
> > PS: Using Mike's report template at
> https://cwiki.apache.org/FLUME/flume-ng-performance-measurements.html
> >
> > Flume Performance Test 2012-07-25
> > Overview
> > The Flume agent was run on its own physical machine in a single JVM. A
> separate client machine generated load against the Flume box in
> List<LogEntry> format. Flume stored data onto a 4-node HDFS cluster
> configured on its own separate hardware. No virtual machines were used in
> this test.
> > Hardware specs
> > CPU: Inter Xeon L5640 2 x quad-core @ 2.27 GHz (12 physical cores)
> > Memory: 16 GB
> > OS: CentOS release 5.3 (Final)
> > Flume configuration
> > JAVA Version: 1.6.0_20 (Java HotSpot 64-Bit Server VM)
> > JAVA OPTS: -Xms1024m -Xmx4096m -XX:PermSize=256m -XX:NewRatio=1
> -XX:SurvivorRatio=5 -XX:InitialTenuringThreshold=15
> -XX:MaxTenuringThreshold=31 -XX:PretenureSizeThreshold=4096
> > Num. agents: 1
> > Num. parallel flows: 5
> > Source: ScribeSource
> > Channel: MemoryChannel
> > Sink: HDFSEventSink
> > Selector: RandomSelector
> > Config-file
> > # list sources, channels, sinks for the agent
> > agent.sources = seqGenSrc
> > agent.channels = mc1 mc2 mc3 mc4 mc5
> > agent.sinks = hdfsSin1 hdfsSin2 hdfsSin3 hdfsSin4 hdfsSin5
> >
> > # define sources
> > agent.sources.seqGenSrc.type > org.apache.flume.source.scribe.ScribeSource
> > agent.sources.seqGenSrc.selector.type = io.flume.RandomSelector
> >
> > # define sinks
> > agent.sinks.hdfsSin1.type = hdfs
> > agent.sinks.hdfsSin1.hdfs.path = /flume_test/data1/
> > agent.sinks.hdfsSin1.hdfs.rollInterval = 300
> > agent.sinks.hdfsSin1.hdfs.rollSize = 0
> > agent.sinks.hdfsSin1.hdfs.rollCount = 1000000
> > agent.sinks.hdfsSin1.hdfs.batchSize = 10000
> > agent.sinks.hdfsSin1.hdfs.fileType = DataStream
> > agent.sinks.hdfsSin1.hdfs.txnEventMax = 1000
> > # ... define sink #2 #3 #4 #5 ...
> >
> > # define channels
> > agent.channels.mc1.type = memory
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB