|
Jagadish Bihani
2012-12-12, 10:05
Jagadish Bihani
2012-12-12, 10:08
Brock Noland
2012-12-12, 15:36
Hari Shreedharan
2012-12-12, 17:53
Bhaskar V. Karambelkar
2012-12-12, 21:13
Hari Shreedharan
2012-12-12, 21:44
Jagadish Bihani
2012-12-18, 11:05
Juhani Connolly
2012-12-19, 09:23
|
-
Recommendation of parameters for better performance with File ChannelJagadish Bihani 2012-12-12, 10:05
Hi
I am able to write maximum 1.5 MB/sec data to HDFS (without compression) using File Channel. Are there any recommendations to improve the performance? Has anybody achieved around 10 MB/sec with file channel ? If yes please share the configuration like (Hardware used, RAM allocated and batch sizes of source,sink and channels). Following are the configuration details : ======================= I am using a machine with reasonable hardware configuration: Quadcore 2.00 GHz processors and 4 GB RAM. Command line options passed to flume agent : -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote -XX:MaxDirectMemorySize=2g" Agent Configuration: ============agent.sources = avro-collection-source spooler agent.channels = fileChannel agent.sinks = hdfsSink fileSink # For each one of the sources, the type is defined agent.sources.spooler.type = spooldir agent.sources.spooler.spoolDir =/root/test_data agent.sources.spooler.batchSize = 1000 agent.sources.spooler.channels = fileChannel # Each sink's type must be defined agent.sinks.hdfsSink.type = hdfs agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test agent.sinks.hdfsSink.hdfs.fileType =DataStream agent.sinks.hdfsSink.hdfs.rollSize=0 agent.sinks.hdfsSink.hdfs.rollCount=0 agent.sinks.hdfsSink.hdfs.batchSize=1000 agent.sinks.hdfsSink.hdfs.rollInterval=60 agent.sinks.hdfsSink.channel= fileChannel agent.channels.fileChannel.type=file agent.channels.fileChannel.dataDirs=/root/flume_channel/dataDir13 agent.channels.fileChannel.checkpointDir=/root/flume_channel/checkpointDir13 Regards, Jagadish +
Jagadish Bihani 2012-12-12, 10:05
-
Re: Recommendation of parameters for better performance with File ChannelJagadish Bihani 2012-12-12, 10:08
I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3. On 12/12/2012 03:35 PM, Jagadish Bihani wrote: > Hi > > I am able to write maximum 1.5 MB/sec data to HDFS (without compression) > using File Channel. Are there any recommendations to improve the > performance? > Has anybody achieved around 10 MB/sec with file channel ? If yes > please share the > configuration like (Hardware used, RAM allocated and batch sizes of > source,sink and channels). > > Following are the configuration details : > =======================> > I am using a machine with reasonable hardware configuration: > Quadcore 2.00 GHz processors and 4 GB RAM. > > Command line options passed to flume agent : > -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote > -XX:MaxDirectMemorySize=2g" > > Agent Configuration: > ============> agent.sources = avro-collection-source spooler > agent.channels = fileChannel > agent.sinks = hdfsSink fileSink > > # For each one of the sources, the type is defined > > agent.sources.spooler.type = spooldir > agent.sources.spooler.spoolDir =/root/test_data > agent.sources.spooler.batchSize = 1000 > agent.sources.spooler.channels = fileChannel > > # Each sink's type must be defined > agent.sinks.hdfsSink.type = hdfs > agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test > > agent.sinks.hdfsSink.hdfs.fileType =DataStream > agent.sinks.hdfsSink.hdfs.rollSize=0 > agent.sinks.hdfsSink.hdfs.rollCount=0 > agent.sinks.hdfsSink.hdfs.batchSize=1000 > agent.sinks.hdfsSink.hdfs.rollInterval=60 > > agent.sinks.hdfsSink.channel= fileChannel > > agent.channels.fileChannel.type=file > agent.channels.fileChannel.dataDirs=/root/flume_channel/dataDir13 > agent.channels.fileChannel.checkpointDir=/root/flume_channel/checkpointDir13 > > > Regards, > Jagadish +
Jagadish Bihani 2012-12-12, 10:08
-
Re: Recommendation of parameters for better performance with File ChannelBrock Noland 2012-12-12, 15:36
Hi,
Why not try increasing the batch size on the source and sink to 10,000? Brock On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani <[EMAIL PROTECTED]> wrote: > > I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3. > > > On 12/12/2012 03:35 PM, Jagadish Bihani wrote: >> >> Hi >> >> I am able to write maximum 1.5 MB/sec data to HDFS (without compression) >> using File Channel. Are there any recommendations to improve the >> performance? >> Has anybody achieved around 10 MB/sec with file channel ? If yes please >> share the >> configuration like (Hardware used, RAM allocated and batch sizes of >> source,sink and channels). >> >> Following are the configuration details : >> =======================>> >> I am using a machine with reasonable hardware configuration: >> Quadcore 2.00 GHz processors and 4 GB RAM. >> >> Command line options passed to flume agent : >> -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote >> -XX:MaxDirectMemorySize=2g" >> >> Agent Configuration: >> ============>> agent.sources = avro-collection-source spooler >> agent.channels = fileChannel >> agent.sinks = hdfsSink fileSink >> >> # For each one of the sources, the type is defined >> >> agent.sources.spooler.type = spooldir >> agent.sources.spooler.spoolDir =/root/test_data >> agent.sources.spooler.batchSize = 1000 >> agent.sources.spooler.channels = fileChannel >> >> # Each sink's type must be defined >> agent.sinks.hdfsSink.type = hdfs >> agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test >> >> agent.sinks.hdfsSink.hdfs.fileType =DataStream >> agent.sinks.hdfsSink.hdfs.rollSize=0 >> agent.sinks.hdfsSink.hdfs.rollCount=0 >> agent.sinks.hdfsSink.hdfs.batchSize=1000 >> agent.sinks.hdfsSink.hdfs.rollInterval=60 >> >> agent.sinks.hdfsSink.channel= fileChannel >> >> agent.channels.fileChannel.type=file >> agent.channels.fileChannel.dataDirs=/root/flume_channel/dataDir13 >> >> agent.channels.fileChannel.checkpointDir=/root/flume_channel/checkpointDir13 >> >> Regards, >> Jagadish > > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ +
Brock Noland 2012-12-12, 15:36
-
Re: Recommendation of parameters for better performance with File ChannelHari Shreedharan 2012-12-12, 17:53
Also note that having multiple sinks often improves performance - though you should have each sink write to a different directory on HDFS. Since each sink really uses only on thread at a time to write, having multiple sinks allows multiple threads to write to HDFS. Also if you can spare additional disks on your Flume agent machine for file channel data directories, that will also improve performance.
Hari -- Hari Shreedharan On Wednesday, December 12, 2012 at 7:36 AM, Brock Noland wrote: > Hi, > > Why not try increasing the batch size on the source and sink to 10,000? > > Brock > > On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani > <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3. > > > > > > On 12/12/2012 03:35 PM, Jagadish Bihani wrote: > > > > > > Hi > > > > > > I am able to write maximum 1.5 MB/sec data to HDFS (without compression) > > > using File Channel. Are there any recommendations to improve the > > > performance? > > > Has anybody achieved around 10 MB/sec with file channel ? If yes please > > > share the > > > configuration like (Hardware used, RAM allocated and batch sizes of > > > source,sink and channels). > > > > > > Following are the configuration details : > > > =======================> > > > > > I am using a machine with reasonable hardware configuration: > > > Quadcore 2.00 GHz processors and 4 GB RAM. > > > > > > Command line options passed to flume agent : > > > -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote > > > -XX:MaxDirectMemorySize=2g" > > > > > > Agent Configuration: > > > ============> > > agent.sources = avro-collection-source spooler > > > agent.channels = fileChannel > > > agent.sinks = hdfsSink fileSink > > > > > > # For each one of the sources, the type is defined > > > > > > agent.sources.spooler.type = spooldir > > > agent.sources.spooler.spoolDir =/root/test_data > > > agent.sources.spooler.batchSize = 1000 > > > agent.sources.spooler.channels = fileChannel > > > > > > # Each sink's type must be defined > > > agent.sinks.hdfsSink.type = hdfs > > > agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test > > > > > > agent.sinks.hdfsSink.hdfs.fileType =DataStream > > > agent.sinks.hdfsSink.hdfs.rollSize=0 > > > agent.sinks.hdfsSink.hdfs.rollCount=0 > > > agent.sinks.hdfsSink.hdfs.batchSize=1000 > > > agent.sinks.hdfsSink.hdfs.rollInterval=60 > > > > > > agent.sinks.hdfsSink.channel= fileChannel > > > > > > agent.channels.fileChannel.type=file > > > agent.channels.fileChannel.dataDirs=/root/flume_channel/dataDir13 > > > > > > agent.channels.fileChannel.checkpointDir=/root/flume_channel/checkpointDir13 > > > > > > Regards, > > > Jagadish > > > > > > > > > > > > -- > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ > > +
Hari Shreedharan 2012-12-12, 17:53
-
Re: Recommendation of parameters for better performance with File ChannelBhaskar V. Karambelkar 2012-12-12, 21:13
Hari,
If each sink uses a different file prefix, what's the need to write to multiple HDFS directories. All our sinks write to the same HDFS directory and each uses a unique file prefix, and it seems to work fine. Also haven't found anything in flume code or HDFS APIs which suggest that two sinks can't write to the same directory. Just curious. thanks On Wed, Dec 12, 2012 at 12:53 PM, Hari Shreedharan <[EMAIL PROTECTED]> wrote: > Also note that having multiple sinks often improves performance - though you > should have each sink write to a different directory on HDFS. Since each > sink really uses only on thread at a time to write, having multiple sinks > allows multiple threads to write to HDFS. Also if you can spare additional > disks on your Flume agent machine for file channel data directories, that > will also improve performance. > > > > Hari > > -- > Hari Shreedharan > > On Wednesday, December 12, 2012 at 7:36 AM, Brock Noland wrote: > > Hi, > > Why not try increasing the batch size on the source and sink to 10,000? > > Brock > > On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani > <[EMAIL PROTECTED]> wrote: > > > I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3. > > > On 12/12/2012 03:35 PM, Jagadish Bihani wrote: > > > Hi > > I am able to write maximum 1.5 MB/sec data to HDFS (without compression) > using File Channel. Are there any recommendations to improve the > performance? > Has anybody achieved around 10 MB/sec with file channel ? If yes please > share the > configuration like (Hardware used, RAM allocated and batch sizes of > source,sink and channels). > > Following are the configuration details : > =======================> > I am using a machine with reasonable hardware configuration: > Quadcore 2.00 GHz processors and 4 GB RAM. > > Command line options passed to flume agent : > -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote > -XX:MaxDirectMemorySize=2g" > > Agent Configuration: > ============> agent.sources = avro-collection-source spooler > agent.channels = fileChannel > agent.sinks = hdfsSink fileSink > > # For each one of the sources, the type is defined > > agent.sources.spooler.type = spooldir > agent.sources.spooler.spoolDir =/root/test_data > agent.sources.spooler.batchSize = 1000 > agent.sources.spooler.channels = fileChannel > > # Each sink's type must be defined > agent.sinks.hdfsSink.type = hdfs > agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test > > agent.sinks.hdfsSink.hdfs.fileType =DataStream > agent.sinks.hdfsSink.hdfs.rollSize=0 > agent.sinks.hdfsSink.hdfs.rollCount=0 > agent.sinks.hdfsSink.hdfs.batchSize=1000 > agent.sinks.hdfsSink.hdfs.rollInterval=60 > > agent.sinks.hdfsSink.channel= fileChannel > > agent.channels.fileChannel.type=file > agent.channels.fileChannel.dataDirs=/root/flume_channel/dataDir13 > > agent.channels.fileChannel.checkpointDir=/root/flume_channel/checkpointDir13 > > Regards, > Jagadish > > > > > -- > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ > > +
Bhaskar V. Karambelkar 2012-12-12, 21:13
-
Re: Recommendation of parameters for better performance with File ChannelHari Shreedharan 2012-12-12, 21:44
Yep, each sink with a different prefix will work fine too. My suggestion was just meant to avoid collision - file prefixes are good enough for that.
-- Hari Shreedharan On Wednesday, December 12, 2012 at 1:13 PM, Bhaskar V. Karambelkar wrote: > Hari, > If each sink uses a different file prefix, what's the need to write to > multiple HDFS directories. > All our sinks write to the same HDFS directory and each uses a unique > file prefix, and it seems to work fine. > Also haven't found anything in flume code or HDFS APIs which suggest > that two sinks can't write to the same directory. > > Just curious. > thanks > > > On Wed, Dec 12, 2012 at 12:53 PM, Hari Shreedharan > <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > Also note that having multiple sinks often improves performance - though you > > should have each sink write to a different directory on HDFS. Since each > > sink really uses only on thread at a time to write, having multiple sinks > > allows multiple threads to write to HDFS. Also if you can spare additional > > disks on your Flume agent machine for file channel data directories, that > > will also improve performance. > > > > > > > > Hari > > > > -- > > Hari Shreedharan > > > > On Wednesday, December 12, 2012 at 7:36 AM, Brock Noland wrote: > > > > Hi, > > > > Why not try increasing the batch size on the source and sink to 10,000? > > > > Brock > > > > On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani > > <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > > > I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3. > > > > > > On 12/12/2012 03:35 PM, Jagadish Bihani wrote: > > > > > > Hi > > > > I am able to write maximum 1.5 MB/sec data to HDFS (without compression) > > using File Channel. Are there any recommendations to improve the > > performance? > > Has anybody achieved around 10 MB/sec with file channel ? If yes please > > share the > > configuration like (Hardware used, RAM allocated and batch sizes of > > source,sink and channels). > > > > Following are the configuration details : > > =======================> > > > I am using a machine with reasonable hardware configuration: > > Quadcore 2.00 GHz processors and 4 GB RAM. > > > > Command line options passed to flume agent : > > -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote > > -XX:MaxDirectMemorySize=2g" > > > > Agent Configuration: > > ============> > agent.sources = avro-collection-source spooler > > agent.channels = fileChannel > > agent.sinks = hdfsSink fileSink > > > > # For each one of the sources, the type is defined > > > > agent.sources.spooler.type = spooldir > > agent.sources.spooler.spoolDir =/root/test_data > > agent.sources.spooler.batchSize = 1000 > > agent.sources.spooler.channels = fileChannel > > > > # Each sink's type must be defined > > agent.sinks.hdfsSink.type = hdfs > > agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test > > > > agent.sinks.hdfsSink.hdfs.fileType =DataStream > > agent.sinks.hdfsSink.hdfs.rollSize=0 > > agent.sinks.hdfsSink.hdfs.rollCount=0 > > agent.sinks.hdfsSink.hdfs.batchSize=1000 > > agent.sinks.hdfsSink.hdfs.rollInterval=60 > > > > agent.sinks.hdfsSink.channel= fileChannel > > > > agent.channels.fileChannel.type=file > > agent.channels.fileChannel.dataDirs=/root/flume_channel/dataDir13 > > > > agent.channels.fileChannel.checkpointDir=/root/flume_channel/checkpointDir13 > > > > Regards, > > Jagadish > > > > > > > > > > -- > > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ > > > > > +
Hari Shreedharan 2012-12-12, 21:44
-
Re: Recommendation of parameters for better performance with File ChannelJagadish Bihani 2012-12-18, 11:05
Hi
Thanks for the inputs Hari and Brock. I had tried for batch size 10000; and throughput increased to 1.8 from 1.5 MB/sec. Then I used multiple HDFS sinks which read from the same channel and I could get around 2.3 MB/sec. Regards, Jagadish On 12/13/2012 03:14 AM, Hari Shreedharan wrote: > Yep, each sink with a different prefix will work fine too. My > suggestion was just meant to avoid collision - file prefixes are good > enough for that. > > -- > Hari Shreedharan > > On Wednesday, December 12, 2012 at 1:13 PM, Bhaskar V. Karambelkar wrote: > >> Hari, >> If each sink uses a different file prefix, what's the need to write to >> multiple HDFS directories. >> All our sinks write to the same HDFS directory and each uses a unique >> file prefix, and it seems to work fine. >> Also haven't found anything in flume code or HDFS APIs which suggest >> that two sinks can't write to the same directory. >> >> Just curious. >> thanks >> >> >> On Wed, Dec 12, 2012 at 12:53 PM, Hari Shreedharan >> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: >>> Also note that having multiple sinks often improves performance - >>> though you >>> should have each sink write to a different directory on HDFS. Since each >>> sink really uses only on thread at a time to write, having multiple >>> sinks >>> allows multiple threads to write to HDFS. Also if you can spare >>> additional >>> disks on your Flume agent machine for file channel data directories, >>> that >>> will also improve performance. >>> >>> >>> >>> Hari >>> >>> -- >>> Hari Shreedharan >>> >>> On Wednesday, December 12, 2012 at 7:36 AM, Brock Noland wrote: >>> >>> Hi, >>> >>> Why not try increasing the batch size on the source and sink to 10,000? >>> >>> Brock >>> >>> On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani >>> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> >>> wrote: >>> >>> >>> I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3. >>> >>> >>> On 12/12/2012 03:35 PM, Jagadish Bihani wrote: >>> >>> >>> Hi >>> >>> I am able to write maximum 1.5 MB/sec data to HDFS (without compression) >>> using File Channel. Are there any recommendations to improve the >>> performance? >>> Has anybody achieved around 10 MB/sec with file channel ? If yes please >>> share the >>> configuration like (Hardware used, RAM allocated and batch sizes of >>> source,sink and channels). >>> >>> Following are the configuration details : >>> =======================>>> >>> I am using a machine with reasonable hardware configuration: >>> Quadcore 2.00 GHz processors and 4 GB RAM. >>> >>> Command line options passed to flume agent : >>> -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote >>> -XX:MaxDirectMemorySize=2g" >>> >>> Agent Configuration: >>> ============>>> agent.sources = avro-collection-source spooler >>> agent.channels = fileChannel >>> agent.sinks = hdfsSink fileSink >>> >>> # For each one of the sources, the type is defined >>> >>> agent.sources.spooler.type = spooldir >>> agent.sources.spooler.spoolDir =/root/test_data >>> agent.sources.spooler.batchSize = 1000 >>> agent.sources.spooler.channels = fileChannel >>> >>> # Each sink's type must be defined >>> agent.sinks.hdfsSink.type = hdfs >>> agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test >>> >>> agent.sinks.hdfsSink.hdfs.fileType =DataStream >>> agent.sinks.hdfsSink.hdfs.rollSize=0 >>> agent.sinks.hdfsSink.hdfs.rollCount=0 >>> agent.sinks.hdfsSink.hdfs.batchSize=1000 >>> agent.sinks.hdfsSink.hdfs.rollInterval=60 >>> >>> agent.sinks.hdfsSink.channel= fileChannel >>> >>> agent.channels.fileChannel.type=file >>> agent.channels.fileChannel.dataDirs=/root/flume_channel/dataDir13 >>> >>> agent.channels.fileChannel.checkpointDir=/root/flume_channel/checkpointDir13 >>> >>> Regards, >>> Jagadish >>> >>> >>> >>> >>> -- >>> Apache MRUnit - Unit testing MapReduce - >>> http://incubator.apache.org/mrunit/ > +
Jagadish Bihani 2012-12-18, 11:05
-
Re: Recommendation of parameters for better performance with File ChannelJuhani Connolly 2012-12-19, 09:23
Hi Jagadish,
You may want to check out the mails "Re: Flume 1.3.0 - NFS + File Channel Performance" It turns out the changes in 1609 affect FileChannel performance a fair bit(even normal non-nfs file systems). We ran a version of 1.3 from an earlier trunk, and took a big performance hit when we switched to the 1.3 release. I isolated it the FLUME-1609 patch. After building the 1.4 trunk and installing, performance was back to normal. On 12/18/2012 08:05 PM, Jagadish Bihani wrote: > Hi > > Thanks for the inputs Hari and Brock. > I had tried for batch size 10000; and throughput increased to 1.8 from > 1.5 MB/sec. > Then I used multiple HDFS sinks which read from the same channel and > I could get around > 2.3 MB/sec. > > Regards, > Jagadish > > > > On 12/13/2012 03:14 AM, Hari Shreedharan wrote: >> Yep, each sink with a different prefix will work fine too. My >> suggestion was just meant to avoid collision - file prefixes are good >> enough for that. >> >> -- >> Hari Shreedharan >> >> On Wednesday, December 12, 2012 at 1:13 PM, Bhaskar V. Karambelkar wrote: >> >>> Hari, >>> If each sink uses a different file prefix, what's the need to write to >>> multiple HDFS directories. >>> All our sinks write to the same HDFS directory and each uses a unique >>> file prefix, and it seems to work fine. >>> Also haven't found anything in flume code or HDFS APIs which suggest >>> that two sinks can't write to the same directory. >>> >>> Just curious. >>> thanks >>> >>> >>> On Wed, Dec 12, 2012 at 12:53 PM, Hari Shreedharan >>> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: >>>> Also note that having multiple sinks often improves performance - >>>> though you >>>> should have each sink write to a different directory on HDFS. Since >>>> each >>>> sink really uses only on thread at a time to write, having multiple >>>> sinks >>>> allows multiple threads to write to HDFS. Also if you can spare >>>> additional >>>> disks on your Flume agent machine for file channel data >>>> directories, that >>>> will also improve performance. >>>> >>>> >>>> >>>> Hari >>>> >>>> -- >>>> Hari Shreedharan >>>> >>>> On Wednesday, December 12, 2012 at 7:36 AM, Brock Noland wrote: >>>> >>>> Hi, >>>> >>>> Why not try increasing the batch size on the source and sink to 10,000? >>>> >>>> Brock >>>> >>>> On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani >>>> <[EMAIL PROTECTED] >>>> <mailto:[EMAIL PROTECTED]>> wrote: >>>> >>>> >>>> I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3. >>>> >>>> >>>> On 12/12/2012 03:35 PM, Jagadish Bihani wrote: >>>> >>>> >>>> Hi >>>> >>>> I am able to write maximum 1.5 MB/sec data to HDFS (without >>>> compression) >>>> using File Channel. Are there any recommendations to improve the >>>> performance? >>>> Has anybody achieved around 10 MB/sec with file channel ? If yes please >>>> share the >>>> configuration like (Hardware used, RAM allocated and batch sizes of >>>> source,sink and channels). >>>> >>>> Following are the configuration details : >>>> =======================>>>> >>>> I am using a machine with reasonable hardware configuration: >>>> Quadcore 2.00 GHz processors and 4 GB RAM. >>>> >>>> Command line options passed to flume agent : >>>> -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote >>>> -XX:MaxDirectMemorySize=2g" >>>> >>>> Agent Configuration: >>>> ============>>>> agent.sources = avro-collection-source spooler >>>> agent.channels = fileChannel >>>> agent.sinks = hdfsSink fileSink >>>> >>>> # For each one of the sources, the type is defined >>>> >>>> agent.sources.spooler.type = spooldir >>>> agent.sources.spooler.spoolDir =/root/test_data >>>> agent.sources.spooler.batchSize = 1000 >>>> agent.sources.spooler.channels = fileChannel >>>> >>>> # Each sink's type must be defined >>>> agent.sinks.hdfsSink.type = hdfs >>>> agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test >>>> >>>> agent.sinks.hdfsSink.hdfs.fileType =DataStream +
Juhani Connolly 2012-12-19, 09:23
|