Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Latest Flume test report and problem


Copy link to this message
-
Re: Latest Flume test report and problem
hi Alex,
    Attachment is preparing for you!
    Long term pause may be the critical problem for us. Do you agree me ?
    Wish your response, thanks!

-Regards
Denny Ye

2012/7/25 alo.alt <[EMAIL PROTECTED]>

> Hey Denny,
>
> thanks for the report.
>
> Can you please try to rerun with:
>
> JAVA_OPTS="-Xms200m -Xmx200m -Xmn32m -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -Xss128k
> -XX:+UseMembar -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -Xloggc:/var/log/flume/gc.log"
>
> Plese attach the gc.log after.
>
> cheers,
> Alex
>
> Am 25.07.2012 10:35, schrieb Denny Ye:
> > hi all,
> >
> >    I tested Flume in last week with ScribeSource(
> > https://issues.apache.org/jira/browse/FLUME-1382) and HDFS Sink. More
> > detailed conditions and deployment cases listed below. Too many 'Full GC'
> > impact the throughput and amount of events promoted into old generation.
> I
> > have applied some tuning methods, no much effect.
> >
> >    Could someone give me your feedback or tip to reduce the GC problem?
> > Wish your attention.
> >
> >
> > PS: Using Mike's report template at
> > https://cwiki.apache.org/FLUME/flume-ng-performance-measurements.html
> >
> > *
> > *
> >
> > *Flume Performance Test 2012-07-25*
> >
> > *Overview*
> >
> > The Flume agent was run on its own physical machine in a single JVM. A
> > separate client machine generated load against the Flume box in
> > List<LogEntry> format. Flume stored data onto a 4-node HDFS cluster
> > configured on its own separate hardware. No virtual machines were used in
> > this test.
> >
> > *Hardware specs*
> >
> > CPU: Inter Xeon L5640 2 x quad-core @ 2.27 GHz (12 physical cores)
> >
> > Memory: 16 GB
> >
> > OS: CentOS release 5.3 (Final)
> >
> > *Flume configuration*
> >
> > JAVA Version: 1.6.0_20 (Java HotSpot 64-Bit Server VM)
> >
> > JAVA OPTS: -Xms1024m -Xmx4096m -XX:PermSize=256m -XX:NewRatio=1
> > -XX:SurvivorRatio=5 -XX:InitialTenuringThreshold=15
> > -XX:MaxTenuringThreshold=31 -XX:PretenureSizeThreshold=4096
> >
> > Num. agents: 1
> >
> > Num. parallel flows: 5
> >
> > Source: ScribeSource
> >
> > Channel: MemoryChannel
> >
> > Sink: HDFSEventSink
> >
> > Selector: RandomSelector
> >
> > *Config-file*
> >
> > # list sources, channels, sinks for the agent
> >
> > agent.sources = seqGenSrc
> >
> > agent.channels = mc1 mc2 mc3 mc4 mc5
> >
> > agent.sinks = hdfsSin1 hdfsSin2 hdfsSin3 hdfsSin4 hdfsSin5
> >
> >
> >
> > # define sources
> >
> > agent.sources.seqGenSrc.type > org.apache.flume.source.scribe.ScribeSource
> >
> > agent.sources.seqGenSrc.selector.type = io.flume.RandomSelector
> >
> >
> >
> > # define sinks
> >
> > agent.sinks.hdfsSin1.type = hdfs
> >
> > agent.sinks.hdfsSin1.hdfs.path = /flume_test/data1/
> >
> > agent.sinks.hdfsSin1.hdfs.rollInterval = 300
> >
> > agent.sinks.hdfsSin1.hdfs.rollSize = 0
> >
> > agent.sinks.hdfsSin1.hdfs.rollCount = 1000000
> >
> > agent.sinks.hdfsSin1.hdfs.batchSize = 10000
> >
> > agent.sinks.hdfsSin1.hdfs.fileType = DataStream
> >
> > agent.sinks.hdfsSin1.hdfs.txnEventMax = 1000
> >
> > # ... define sink #2 #3 #4 #5 ...
> >
> >
> >
> > # define channels
> >
> > agent.channels.mc1.type = memory
> >
> > agent.channels.mc1.capacity = 1000000
> >
> > agent.channels.mc1.transactionCapacity = 1000
> >
> > # ... define channel #2 #3 #4 #5 ...
> >
> >
> >
> > # specify the channel each sink and source should use
> >
> > agent.sources.seqGenSrc.channels = mc1 mc2 mc3 mc4 mc5
> >
> > agent.sinks.hdfsSin1.channel = mc1
> >
> > # ... specify sink #2 #3 #4 #5 ...
> >
> > *Hadoop configuration*
> >
> > The HDFS sink was connected to a 4-node Hadoop cluster running CDH3u1.
> For
> > different HDFS sink, HDFS wrote data into different path.
> >
> > *Visualization of test setup*
> >
> >
> https://lh3.googleusercontent.com/dGumq1pu1Wr3Bj8WJmRHOoLWmUlGqxC4wW7_XCNO9R1wuh15LRXaKKxGoccpjBXtgqcdSVW-vtg
> >
> > There are 10 Scribe Clients and each client send 20 million LogEntry
> > objects to ScribleSource.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB