|
|
+
Jagadish Bihani 2012-12-12, 10:05
+
Jagadish Bihani 2012-12-12, 10:08
+
Brock Noland 2012-12-12, 15:36
+
Hari Shreedharan 2012-12-12, 17:53
+
Bhaskar V. Karambelkar 2012-12-12, 21:13
+
Hari Shreedharan 2012-12-12, 21:44
+
Jagadish Bihani 2012-12-18, 11:05
-
Re: Recommendation of parameters for better performance with File ChannelJuhani Connolly 2012-12-19, 09:23
Hi Jagadish,
You may want to check out the mails "Re: Flume 1.3.0 - NFS + File Channel Performance" It turns out the changes in 1609 affect FileChannel performance a fair bit(even normal non-nfs file systems). We ran a version of 1.3 from an earlier trunk, and took a big performance hit when we switched to the 1.3 release. I isolated it the FLUME-1609 patch. After building the 1.4 trunk and installing, performance was back to normal. On 12/18/2012 08:05 PM, Jagadish Bihani wrote: > Hi > > Thanks for the inputs Hari and Brock. > I had tried for batch size 10000; and throughput increased to 1.8 from > 1.5 MB/sec. > Then I used multiple HDFS sinks which read from the same channel and > I could get around > 2.3 MB/sec. > > Regards, > Jagadish > > > > On 12/13/2012 03:14 AM, Hari Shreedharan wrote: >> Yep, each sink with a different prefix will work fine too. My >> suggestion was just meant to avoid collision - file prefixes are good >> enough for that. >> >> -- >> Hari Shreedharan >> >> On Wednesday, December 12, 2012 at 1:13 PM, Bhaskar V. Karambelkar wrote: >> >>> Hari, >>> If each sink uses a different file prefix, what's the need to write to >>> multiple HDFS directories. >>> All our sinks write to the same HDFS directory and each uses a unique >>> file prefix, and it seems to work fine. >>> Also haven't found anything in flume code or HDFS APIs which suggest >>> that two sinks can't write to the same directory. >>> >>> Just curious. >>> thanks >>> >>> >>> On Wed, Dec 12, 2012 at 12:53 PM, Hari Shreedharan >>> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: >>>> Also note that having multiple sinks often improves performance - >>>> though you >>>> should have each sink write to a different directory on HDFS. Since >>>> each >>>> sink really uses only on thread at a time to write, having multiple >>>> sinks >>>> allows multiple threads to write to HDFS. Also if you can spare >>>> additional >>>> disks on your Flume agent machine for file channel data >>>> directories, that >>>> will also improve performance. >>>> >>>> >>>> >>>> Hari >>>> >>>> -- >>>> Hari Shreedharan >>>> >>>> On Wednesday, December 12, 2012 at 7:36 AM, Brock Noland wrote: >>>> >>>> Hi, >>>> >>>> Why not try increasing the batch size on the source and sink to 10,000? >>>> >>>> Brock >>>> >>>> On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani >>>> <[EMAIL PROTECTED] >>>> <mailto:[EMAIL PROTECTED]>> wrote: >>>> >>>> >>>> I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3. >>>> >>>> >>>> On 12/12/2012 03:35 PM, Jagadish Bihani wrote: >>>> >>>> >>>> Hi >>>> >>>> I am able to write maximum 1.5 MB/sec data to HDFS (without >>>> compression) >>>> using File Channel. Are there any recommendations to improve the >>>> performance? >>>> Has anybody achieved around 10 MB/sec with file channel ? If yes please >>>> share the >>>> configuration like (Hardware used, RAM allocated and batch sizes of >>>> source,sink and channels). >>>> >>>> Following are the configuration details : >>>> =======================>>>> >>>> I am using a machine with reasonable hardware configuration: >>>> Quadcore 2.00 GHz processors and 4 GB RAM. >>>> >>>> Command line options passed to flume agent : >>>> -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote >>>> -XX:MaxDirectMemorySize=2g" >>>> >>>> Agent Configuration: >>>> ============>>>> agent.sources = avro-collection-source spooler >>>> agent.channels = fileChannel >>>> agent.sinks = hdfsSink fileSink >>>> >>>> # For each one of the sources, the type is defined >>>> >>>> agent.sources.spooler.type = spooldir >>>> agent.sources.spooler.spoolDir =/root/test_data >>>> agent.sources.spooler.batchSize = 1000 >>>> agent.sources.spooler.channels = fileChannel >>>> >>>> # Each sink's type must be defined >>>> agent.sinks.hdfsSink.type = hdfs >>>> agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test >>>> >>>> agent.sinks.hdfsSink.hdfs.fileType =DataStream |