Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Re: Avro sink to source is too slow


+
Mike Keane 2013-09-30, 16:21
+
Anat Rozenzon 2013-09-30, 19:29
Copy link to this message
-
Avro sink to source is too slow
Hi

I'm trying to read 100MB of files using directory spooler, file channel and
4 avro sinks into an avro source running on another flume process.
Both flume processes are running on same machine just for eliminating
network issues.

However it takes more than 5 minutes to read & pass the 100MB data, this is
too slow for our needs.

After about 1 minute the files are read into the file channel and then
quite a long time where the file channel is draining really slowly with the
four sinks.

Copying the same data using scp from a remote machine takes 7 seconds.

Below is my config, anything I can do to improve this?

agent.sources = logsdir
agent.sources.logsdir.type = spooldir
agent.sources.logsdir.channels = fileChannel
agent.sources.logsdir.spoolDir = %%WORK_DIR%%
agent.sources.logsdir.fileHeader = true
agent.sources.logsdir.batchSize=1000
agent.sources.logsdir.deletePolicy=immediate
agent.sources.logsdir.interceptors =  ihost iserver_type iserver_id
agent.sources.logsdir.interceptors.ihost.type = host
agent.sources.logsdir.interceptors.ihost.useIP = false
agent.sources.logsdir.interceptors.ihost.hostHeader = server_hostname

agent.sources.logsdir.interceptors.iserver_type.type = static
agent.sources.logsdir.interceptors.iserver_type.key = server_type
agent.sources.logsdir.interceptors.iserver_type.value = %%SERVER_TYPE%%
agent.sources.logsdir.interceptors.iserver_id.type = static
agent.sources.logsdir.interceptors.iserver_id.key = server_id
agent.sources.logsdir.interceptors.iserver_id.value = %%SERVER_ID%%

agent.sources.logsdir.deserializer.maxLineLength = 10240
agent.channels = fileChannel
agent.channels.fileChannel.type = file
agent.channels.fileChannel.checkpointDir=%%WORK_DIR%%/flume/filechannel/checkpoint
agent.channels.fileChannel.dataDirs=%%WORK_DIR%%/flume/filechannel/data
agent.channels.fileChannel.capacity=2000000
agent.channels.fileChannel.transactionCapacity=1000
agent.channels.fileChannel.use-fast-replay=true
agent.channels.fileChannel.useDualCheckpoints=true
agent.channels.fileChannel.backupCheckpointDir=%%WORK_DIR%%/flume/filechannel/backupCheckpointDir
agent.channels.fileChannel.minimumRequiredSpace=1073741824
agent.channels.fileChannel.maxFileSize=524288000

## Send to  multiple Collectors for load balancing
agent.sinks = AvroSink1-1 AvroSink1-2 AvroSink1-3 AvroSink1-4

agent.sinks.AvroSink1-1.type = avro
agent.sinks.AvroSink1-1.channel = fileChannel
agent.sinks.AvroSink1-1.hostname = %%COLLECTOR1_SERVER%%
agent.sinks.AvroSink1-1.port = 4545%%COLLECTOR1_SLOT%%
agent.sinks.AvroSink1-1.connect-timeout = 60000
agent.sinks.AvroSink1-1.request-timeout = 60000
agent.sinks.AvroSink1-1.batch-size = 1000
agent.sinks.AvroSink1-1.compression-type=deflate
agent.sinks.AvroSink1-1.compression-level=9

agent.sinks.AvroSink1-2.type = avro
agent.sinks.AvroSink1-2.channel = fileChannel
agent.sinks.AvroSink1-2.hostname = %%COLLECTOR1_SERVER%%
agent.sinks.AvroSink1-2.port = 4545%%COLLECTOR1_SLOT%%
agent.sinks.AvroSink1-2.connect-timeout = 60000
agent.sinks.AvroSink1-2.request-timeout = 60000
agent.sinks.AvroSink1-2.batch-size = 1000
agent.sinks.AvroSink1-2.compression-type=deflate
agent.sinks.AvroSink1-2.compression-level=9

agent.sinks.AvroSink1-3.type = avro
agent.sinks.AvroSink1-3.channel = fileChannel
agent.sinks.AvroSink1-3.hostname = %%COLLECTOR1_SERVER%%
agent.sinks.AvroSink1-3.port = 4545%%COLLECTOR1_SLOT%%
agent.sinks.AvroSink1-3.connect-timeout = 60000
agent.sinks.AvroSink1-3.request-timeout = 60000
agent.sinks.AvroSink1-3.batch-size = 1000
agent.sinks.AvroSink1-3.compression-type=deflate
agent.sinks.AvroSink1-3.compression-level=9

agent.sinks.AvroSink1-4.type = avro
agent.sinks.AvroSink1-4.channel = fileChannel
agent.sinks.AvroSink1-4.hostname = %%COLLECTOR1_SERVER%%
agent.sinks.AvroSink1-4.port = 4545%%COLLECTOR1_SLOT%%
agent.sinks.AvroSink1-4.connect-timeout = 60000
agent.sinks.AvroSink1-4.request-timeout = 60000
agent.sinks.AvroSink1-4.batch-size = 1000
agent.sinks.AvroSink1-4.compression-type=deflate
agent.sinks.AvroSink1-4.compression-level=9

Thanks
Anat
+
Mike Keane 2013-09-30, 20:02
+
Roshan Naik 2013-09-30, 22:50
+
Anat Rozenzon 2013-10-01, 12:41
+
Roshan Naik 2013-10-01, 20:10
+
Anat Rozenzon 2013-10-03, 10:43
+
Hari Shreedharan 2013-10-03, 15:28
+
Anat Rozenzon 2013-10-03, 19:54
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB