Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # dev - Latest Flume test report and problem


Copy link to this message
-
Re: Latest Flume test report and problem
Denny Ye 2012-07-25, 09:48
hi Alex,
    Attachment is preparing for you!
    Long term pause may be the critical problem for us. Do you agree me ?
    Wish your response, thanks!

-Regards
Denny Ye

2012/7/25 alo.alt <[EMAIL PROTECTED]>

> Hey Denny,
>
> thanks for the report.
>
> Can you please try to rerun with:
>
> JAVA_OPTS="-Xms200m -Xmx200m -Xmn32m -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -Xss128k
> -XX:+UseMembar -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -Xloggc:/var/log/flume/gc.log"
>
> Plese attach the gc.log after.
>
> cheers,
> Alex
>
> Am 25.07.2012 10:35, schrieb Denny Ye:
> > hi all,
> >
> >    I tested Flume in last week with ScribeSource(
> > https://issues.apache.org/jira/browse/FLUME-1382) and HDFS Sink. More
> > detailed conditions and deployment cases listed below. Too many 'Full GC'
> > impact the throughput and amount of events promoted into old generation.
> I
> > have applied some tuning methods, no much effect.
> >
> >    Could someone give me your feedback or tip to reduce the GC problem?
> > Wish your attention.
> >
> >
> > PS: Using Mike's report template at
> > https://cwiki.apache.org/FLUME/flume-ng-performance-measurements.html
> >
> > *
> > *
> >
> > *Flume Performance Test 2012-07-25*
> >
> > *Overview*
> >
> > The Flume agent was run on its own physical machine in a single JVM. A
> > separate client machine generated load against the Flume box in
> > List<LogEntry> format. Flume stored data onto a 4-node HDFS cluster
> > configured on its own separate hardware. No virtual machines were used in
> > this test.
> >
> > *Hardware specs*
> >
> > CPU: Inter Xeon L5640 2 x quad-core @ 2.27 GHz (12 physical cores)
> >
> > Memory: 16 GB
> >
> > OS: CentOS release 5.3 (Final)
> >
> > *Flume configuration*
> >
> > JAVA Version: 1.6.0_20 (Java HotSpot 64-Bit Server VM)
> >
> > JAVA OPTS: -Xms1024m -Xmx4096m -XX:PermSize=256m -XX:NewRatio=1
> > -XX:SurvivorRatio=5 -XX:InitialTenuringThreshold=15
> > -XX:MaxTenuringThreshold=31 -XX:PretenureSizeThreshold=4096
> >
> > Num. agents: 1
> >
> > Num. parallel flows: 5
> >
> > Source: ScribeSource
> >
> > Channel: MemoryChannel
> >
> > Sink: HDFSEventSink
> >
> > Selector: RandomSelector
> >
> > *Config-file*
> >
> > # list sources, channels, sinks for the agent
> >
> > agent.sources = seqGenSrc
> >
> > agent.channels = mc1 mc2 mc3 mc4 mc5
> >
> > agent.sinks = hdfsSin1 hdfsSin2 hdfsSin3 hdfsSin4 hdfsSin5
> >
> >
> >
> > # define sources
> >
> > agent.sources.seqGenSrc.type > org.apache.flume.source.scribe.ScribeSource
> >
> > agent.sources.seqGenSrc.selector.type = io.flume.RandomSelector
> >
> >
> >
> > # define sinks
> >
> > agent.sinks.hdfsSin1.type = hdfs
> >
> > agent.sinks.hdfsSin1.hdfs.path = /flume_test/data1/
> >
> > agent.sinks.hdfsSin1.hdfs.rollInterval = 300
> >
> > agent.sinks.hdfsSin1.hdfs.rollSize = 0
> >
> > agent.sinks.hdfsSin1.hdfs.rollCount = 1000000
> >
> > agent.sinks.hdfsSin1.hdfs.batchSize = 10000
> >
> > agent.sinks.hdfsSin1.hdfs.fileType = DataStream
> >
> > agent.sinks.hdfsSin1.hdfs.txnEventMax = 1000
> >
> > # ... define sink #2 #3 #4 #5 ...
> >
> >
> >
> > # define channels
> >
> > agent.channels.mc1.type = memory
> >
> > agent.channels.mc1.capacity = 1000000
> >
> > agent.channels.mc1.transactionCapacity = 1000
> >
> > # ... define channel #2 #3 #4 #5 ...
> >
> >
> >
> > # specify the channel each sink and source should use
> >
> > agent.sources.seqGenSrc.channels = mc1 mc2 mc3 mc4 mc5
> >
> > agent.sinks.hdfsSin1.channel = mc1
> >
> > # ... specify sink #2 #3 #4 #5 ...
> >
> > *Hadoop configuration*
> >
> > The HDFS sink was connected to a 4-node Hadoop cluster running CDH3u1.
> For
> > different HDFS sink, HDFS wrote data into different path.
> >
> > *Visualization of test setup*
> >
> >
> https://lh3.googleusercontent.com/dGumq1pu1Wr3Bj8WJmRHOoLWmUlGqxC4wW7_XCNO9R1wuh15LRXaKKxGoccpjBXtgqcdSVW-vtg
> >
> > There are 10 Scribe Clients and each client send 20 million LogEntry
> > objects to ScribleSource.