Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Configuring flume for better throughput


Copy link to this message
-
Re: Configuring flume for better throughput
I'm continuing to debug the performance issues, added more sinks but it all
seems to be boiling down to the performance of the FileChannel. Right now
I'm focusing on the performance of the HDFS Writer machine. On that machine
I have 4 disks(apart from a separate disk just for the OS), so I'm using 4
file channels with checkpoint + data directories on their own dedicated
disk. As mentioned earlier, Avro Sinks write to these FileChannels and HDFS
Sinks drain the channel. I'm getting very poor performance draining the
channels, ~2.5MB/s for all 4 channels combined. I replaced the file channel
with memory channel just to test and saw that I could drain the channels at
more than 15 MB/s. So HDFS sinks aren't the issue.

I haven't seen any issue with writing to the FileChannel so far, I'm
surprised that reading is turning out to be slower. Here are the
FileChannel stats:
"CHANNEL.ch1": {
        "ChannelCapacity": "75000000",
        "ChannelFillPercentage": "7.5033080000000005",
        "ChannelSize": "5627481",
        "EventPutAttemptCount": "11465743",
        "EventPutSuccessCount": "11465481",
        "EventTakeAttemptCount": "5841907",
        "EventTakeSuccessCount": "5838000",
        "StartTime": "1375320933471",
        "StopTime": "0",
        "Type": "CHANNEL"
    },

EventTakeAttemptCount is much less than EventPutAttemptCount and the sinks
are lagging. I'm surprised how even the attempts to drain the channel are
lesser. That would seem to point to the HDFS sinks but they do just fine
with the Memory Channel, so they are clearly not bound on either writing to
HDFS or on network I/O. I've checked the network capacity separately as
well and we are using less than 10% of the network capacity, thus
definitely not bound there.

In my workflow reliability of FileChannel is essential thus can't switch to
Memory channel. I would really appreciate any suggestions on how to tune
the performance of FileChannel. Here are the settings of one of the
FileChannels:

agent1.channels.ch1.type = FILE
agent1.channels.ch1.checkpointDir = /flume1/checkpoint
agent1.channels.ch1.dataDirs = /flume1/data
agent1.channels.ch1.maxFileSize = 375809638400
agent1.channels.ch1.capacity = 75000000
agent1.channels.ch1.transactionCapacity = 24000
agent1.channels.ch1. checkpointInterval = 300000

As can be seen I increased the checkpointInterval but that didn't help
either.

Here are the settings for one of the HDFS Sinks. I have tried varying the
number of these sinks from 8 to 32 to no effect:
agent1.sinks.hdfs-sink1-1.channel = ch1
agent1.sinks.hdfs-sink1-1.type = hdfs
#Use DNS of the HDFS namenode
agent1.sinks.hdfs-sink1-1.hdfs.path = hdfs://nameservice1/store/f-1-1/
agent1.sinks.hdfs-sink1-1.hdfs.filePrefix = event
agent1.sinks.hdfs-sink1-1.hdfs.writeFormat = Text
agent1.sinks.hdfs-sink1-1.hdfs.rollInterval = 120
agent1.sinks.hdfs-sink1-1.hdfs.idleTimeout= 180
agent1.sinks.hdfs-sink1-1.hdfs.rollCount = 0
agent1.sinks.hdfs-sink1-1.hdfs.rollSize = 0
agent1.sinks.hdfs-sink1-1.hdfs.fileType = DataStream
agent1.sinks.hdfs-sink1-1.hdfs.batchSize = 1000
agent1.sinks.hdfs-sink1-1.hdfs.txnEventSize = 1000
agent1.sinks.hdfs-sink1-1.hdfs.callTimeout = 20000
agent1.sinks.hdfs-sink1-1.hdfs.threadsPoolSize = 1

I've tried increasing the batchSize(along with txnEventSize) of HDFS Sink
from 1000 to 240000 without effect.

I've also verified that there is enough RAM on the box for enough page
cache and iostat shows almost no reads going to disk. I really can't figure
out why FileChannel would be so much slower than memory channel if reads
are being served from Memory.

FileChannel is so fundamental to our workflow, I would expect it would be
for others too. What has been the experience of others with FileChannel? I
will really appreciate any suggestions.

Thanks in Advance,
Pankaj

On Fri, Jul 26, 2013 at 2:12 PM, Pankaj Gupta <[EMAIL PROTECTED]> wrote:

> Here is the flume config of the collector machine. The File channel is
> drained by 4 flume sinks that send messages to a separate hdfs-writer

*P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | [EMAIL PROTECTED]

Pankaj Gupta | Software Engineer

*BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
United States | Canada | United Kingdom | Germany
We're hiring<http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7>
!
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB