|
|
+
Jagadish Bihani 2012-12-12, 10:05
+
Jagadish Bihani 2012-12-12, 10:08
+
Brock Noland 2012-12-12, 15:36
+
Hari Shreedharan 2012-12-12, 17:53
+
Bhaskar V. Karambelkar 2012-12-12, 21:13
+
Hari Shreedharan 2012-12-12, 21:44
-
Re: Recommendation of parameters for better performance with File ChannelJagadish Bihani 2012-12-18, 11:05
Hi
Thanks for the inputs Hari and Brock. I had tried for batch size 10000; and throughput increased to 1.8 from 1.5 MB/sec. Then I used multiple HDFS sinks which read from the same channel and I could get around 2.3 MB/sec. Regards, Jagadish On 12/13/2012 03:14 AM, Hari Shreedharan wrote: > Yep, each sink with a different prefix will work fine too. My > suggestion was just meant to avoid collision - file prefixes are good > enough for that. > > -- > Hari Shreedharan > > On Wednesday, December 12, 2012 at 1:13 PM, Bhaskar V. Karambelkar wrote: > >> Hari, >> If each sink uses a different file prefix, what's the need to write to >> multiple HDFS directories. >> All our sinks write to the same HDFS directory and each uses a unique >> file prefix, and it seems to work fine. >> Also haven't found anything in flume code or HDFS APIs which suggest >> that two sinks can't write to the same directory. >> >> Just curious. >> thanks >> >> >> On Wed, Dec 12, 2012 at 12:53 PM, Hari Shreedharan >> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: >>> Also note that having multiple sinks often improves performance - >>> though you >>> should have each sink write to a different directory on HDFS. Since each >>> sink really uses only on thread at a time to write, having multiple >>> sinks >>> allows multiple threads to write to HDFS. Also if you can spare >>> additional >>> disks on your Flume agent machine for file channel data directories, >>> that >>> will also improve performance. >>> >>> >>> >>> Hari >>> >>> -- >>> Hari Shreedharan >>> >>> On Wednesday, December 12, 2012 at 7:36 AM, Brock Noland wrote: >>> >>> Hi, >>> >>> Why not try increasing the batch size on the source and sink to 10,000? >>> >>> Brock >>> >>> On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani >>> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> >>> wrote: >>> >>> >>> I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3. >>> >>> >>> On 12/12/2012 03:35 PM, Jagadish Bihani wrote: >>> >>> >>> Hi >>> >>> I am able to write maximum 1.5 MB/sec data to HDFS (without compression) >>> using File Channel. Are there any recommendations to improve the >>> performance? >>> Has anybody achieved around 10 MB/sec with file channel ? If yes please >>> share the >>> configuration like (Hardware used, RAM allocated and batch sizes of >>> source,sink and channels). >>> >>> Following are the configuration details : >>> =======================>>> >>> I am using a machine with reasonable hardware configuration: >>> Quadcore 2.00 GHz processors and 4 GB RAM. >>> >>> Command line options passed to flume agent : >>> -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote >>> -XX:MaxDirectMemorySize=2g" >>> >>> Agent Configuration: >>> ============>>> agent.sources = avro-collection-source spooler >>> agent.channels = fileChannel >>> agent.sinks = hdfsSink fileSink >>> >>> # For each one of the sources, the type is defined >>> >>> agent.sources.spooler.type = spooldir >>> agent.sources.spooler.spoolDir =/root/test_data >>> agent.sources.spooler.batchSize = 1000 >>> agent.sources.spooler.channels = fileChannel >>> >>> # Each sink's type must be defined >>> agent.sinks.hdfsSink.type = hdfs >>> agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test >>> >>> agent.sinks.hdfsSink.hdfs.fileType =DataStream >>> agent.sinks.hdfsSink.hdfs.rollSize=0 >>> agent.sinks.hdfsSink.hdfs.rollCount=0 >>> agent.sinks.hdfsSink.hdfs.batchSize=1000 >>> agent.sinks.hdfsSink.hdfs.rollInterval=60 >>> >>> agent.sinks.hdfsSink.channel= fileChannel >>> >>> agent.channels.fileChannel.type=file >>> agent.channels.fileChannel.dataDirs=/root/flume_channel/dataDir13 >>> >>> agent.channels.fileChannel.checkpointDir=/root/flume_channel/checkpointDir13 >>> >>> Regards, >>> Jagadish >>> >>> >>> >>> >>> -- >>> Apache MRUnit - Unit testing MapReduce - >>> http://incubator.apache.org/mrunit/ > +
Juhani Connolly 2012-12-19, 09:23
|