Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Recommendation of parameters for better performance with File Channel


+
Jagadish Bihani 2012-12-12, 10:05
+
Jagadish Bihani 2012-12-12, 10:08
+
Brock Noland 2012-12-12, 15:36
+
Hari Shreedharan 2012-12-12, 17:53
+
Bhaskar V. Karambelkar 2012-12-12, 21:13
+
Hari Shreedharan 2012-12-12, 21:44
Copy link to this message
-
Re: Recommendation of parameters for better performance with File Channel
Jagadish Bihani 2012-12-18, 11:05
Hi

Thanks for the inputs Hari and Brock.
I had tried for batch size 10000; and throughput increased to 1.8 from
1.5 MB/sec.
Then I  used multiple HDFS sinks which read from the same channel and I
could get around
2.3 MB/sec.

Regards,
Jagadish

On 12/13/2012 03:14 AM, Hari Shreedharan wrote:
> Yep, each sink with a different prefix will work fine too. My
> suggestion was just meant to avoid collision - file prefixes are good
> enough for that.
>
> --
> Hari Shreedharan
>
> On Wednesday, December 12, 2012 at 1:13 PM, Bhaskar V. Karambelkar wrote:
>
>> Hari,
>> If each sink uses a different file prefix, what's the need to write to
>> multiple HDFS directories.
>> All our sinks write to the same HDFS directory and each uses a unique
>> file prefix, and it seems to work fine.
>> Also haven't found anything in flume code or HDFS APIs which suggest
>> that two sinks can't write to the same directory.
>>
>> Just curious.
>> thanks
>>
>>
>> On Wed, Dec 12, 2012 at 12:53 PM, Hari Shreedharan
>> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>>> Also note that having multiple sinks often improves performance -
>>> though you
>>> should have each sink write to a different directory on HDFS. Since each
>>> sink really uses only on thread at a time to write, having multiple
>>> sinks
>>> allows multiple threads to write to HDFS. Also if you can spare
>>> additional
>>> disks on your Flume agent machine for file channel data directories,
>>> that
>>> will also improve performance.
>>>
>>>
>>>
>>> Hari
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Wednesday, December 12, 2012 at 7:36 AM, Brock Noland wrote:
>>>
>>> Hi,
>>>
>>> Why not try increasing the batch size on the source and sink to 10,000?
>>>
>>> Brock
>>>
>>> On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani
>>> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
>>> wrote:
>>>
>>>
>>> I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3.
>>>
>>>
>>> On 12/12/2012 03:35 PM, Jagadish Bihani wrote:
>>>
>>>
>>> Hi
>>>
>>> I am able to write maximum 1.5 MB/sec data to HDFS (without compression)
>>> using File Channel. Are there any recommendations to improve the
>>> performance?
>>> Has anybody achieved around 10 MB/sec with file channel ? If yes please
>>> share the
>>> configuration like (Hardware used, RAM allocated and batch sizes of
>>> source,sink and channels).
>>>
>>> Following are the configuration details :
>>> =======================>>>
>>> I am using a machine with reasonable hardware configuration:
>>> Quadcore 2.00 GHz processors and 4 GB RAM.
>>>
>>> Command line options passed to flume agent :
>>> -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote
>>> -XX:MaxDirectMemorySize=2g"
>>>
>>> Agent Configuration:
>>> ============>>> agent.sources = avro-collection-source spooler
>>> agent.channels = fileChannel
>>> agent.sinks = hdfsSink fileSink
>>>
>>> # For each one of the sources, the type is defined
>>>
>>> agent.sources.spooler.type = spooldir
>>> agent.sources.spooler.spoolDir =/root/test_data
>>> agent.sources.spooler.batchSize = 1000
>>> agent.sources.spooler.channels = fileChannel
>>>
>>> # Each sink's type must be defined
>>> agent.sinks.hdfsSink.type = hdfs
>>> agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test
>>>
>>> agent.sinks.hdfsSink.hdfs.fileType =DataStream
>>> agent.sinks.hdfsSink.hdfs.rollSize=0
>>> agent.sinks.hdfsSink.hdfs.rollCount=0
>>> agent.sinks.hdfsSink.hdfs.batchSize=1000
>>> agent.sinks.hdfsSink.hdfs.rollInterval=60
>>>
>>> agent.sinks.hdfsSink.channel= fileChannel
>>>
>>> agent.channels.fileChannel.type=file
>>> agent.channels.fileChannel.dataDirs=/root/flume_channel/dataDir13
>>>
>>> agent.channels.fileChannel.checkpointDir=/root/flume_channel/checkpointDir13
>>>
>>> Regards,
>>> Jagadish
>>>
>>>
>>>
>>>
>>> --
>>> Apache MRUnit - Unit testing MapReduce -
>>> http://incubator.apache.org/mrunit/
>

+
Juhani Connolly 2012-12-19, 09:23