Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Avro sink to source is too slow


Copy link to this message
-
Avro sink to source is too slow
Anat Rozenzon 2013-09-30, 16:10
Hi

I'm trying to read 100MB of files using directory spooler, file channel and
4 avro sinks into an avro source running on another flume process.
Both flume processes are running on same machine just for eliminating
network issues.

However it takes more than 5 minutes to read & pass the 100MB data, this is
too slow for our needs.

After about 1 minute the files are read into the file channel and then
quite a long time where the file channel is draining really slowly with the
four sinks.

Copying the same data using scp from a remote machine takes 7 seconds.

Below is my config, anything I can do to improve this?

agent.sources = logsdir
agent.sources.logsdir.type = spooldir
agent.sources.logsdir.channels = fileChannel
agent.sources.logsdir.spoolDir = %%WORK_DIR%%
agent.sources.logsdir.fileHeader = true
agent.sources.logsdir.batchSize=1000
agent.sources.logsdir.deletePolicy=immediate
agent.sources.logsdir.interceptors =  ihost iserver_type iserver_id
agent.sources.logsdir.interceptors.ihost.type = host
agent.sources.logsdir.interceptors.ihost.useIP = false
agent.sources.logsdir.interceptors.ihost.hostHeader = server_hostname

agent.sources.logsdir.interceptors.iserver_type.type = static
agent.sources.logsdir.interceptors.iserver_type.key = server_type
agent.sources.logsdir.interceptors.iserver_type.value = %%SERVER_TYPE%%
agent.sources.logsdir.interceptors.iserver_id.type = static
agent.sources.logsdir.interceptors.iserver_id.key = server_id
agent.sources.logsdir.interceptors.iserver_id.value = %%SERVER_ID%%

agent.sources.logsdir.deserializer.maxLineLength = 10240
agent.channels = fileChannel
agent.channels.fileChannel.type = file
agent.channels.fileChannel.checkpointDir=%%WORK_DIR%%/flume/filechannel/checkpoint
agent.channels.fileChannel.dataDirs=%%WORK_DIR%%/flume/filechannel/data
agent.channels.fileChannel.capacity=2000000
agent.channels.fileChannel.transactionCapacity=1000
agent.channels.fileChannel.use-fast-replay=true
agent.channels.fileChannel.useDualCheckpoints=true
agent.channels.fileChannel.backupCheckpointDir=%%WORK_DIR%%/flume/filechannel/backupCheckpointDir
agent.channels.fileChannel.minimumRequiredSpace=1073741824
agent.channels.fileChannel.maxFileSize=524288000

## Send to  multiple Collectors for load balancing
agent.sinks = AvroSink1-1 AvroSink1-2 AvroSink1-3 AvroSink1-4

agent.sinks.AvroSink1-1.type = avro
agent.sinks.AvroSink1-1.channel = fileChannel
agent.sinks.AvroSink1-1.hostname = %%COLLECTOR1_SERVER%%
agent.sinks.AvroSink1-1.port = 4545%%COLLECTOR1_SLOT%%
agent.sinks.AvroSink1-1.connect-timeout = 60000
agent.sinks.AvroSink1-1.request-timeout = 60000
agent.sinks.AvroSink1-1.batch-size = 1000
agent.sinks.AvroSink1-1.compression-type=deflate
agent.sinks.AvroSink1-1.compression-level=9

agent.sinks.AvroSink1-2.type = avro
agent.sinks.AvroSink1-2.channel = fileChannel
agent.sinks.AvroSink1-2.hostname = %%COLLECTOR1_SERVER%%
agent.sinks.AvroSink1-2.port = 4545%%COLLECTOR1_SLOT%%
agent.sinks.AvroSink1-2.connect-timeout = 60000
agent.sinks.AvroSink1-2.request-timeout = 60000
agent.sinks.AvroSink1-2.batch-size = 1000
agent.sinks.AvroSink1-2.compression-type=deflate
agent.sinks.AvroSink1-2.compression-level=9

agent.sinks.AvroSink1-3.type = avro
agent.sinks.AvroSink1-3.channel = fileChannel
agent.sinks.AvroSink1-3.hostname = %%COLLECTOR1_SERVER%%
agent.sinks.AvroSink1-3.port = 4545%%COLLECTOR1_SLOT%%
agent.sinks.AvroSink1-3.connect-timeout = 60000
agent.sinks.AvroSink1-3.request-timeout = 60000
agent.sinks.AvroSink1-3.batch-size = 1000
agent.sinks.AvroSink1-3.compression-type=deflate
agent.sinks.AvroSink1-3.compression-level=9

agent.sinks.AvroSink1-4.type = avro
agent.sinks.AvroSink1-4.channel = fileChannel
agent.sinks.AvroSink1-4.hostname = %%COLLECTOR1_SERVER%%
agent.sinks.AvroSink1-4.port = 4545%%COLLECTOR1_SLOT%%
agent.sinks.AvroSink1-4.connect-timeout = 60000
agent.sinks.AvroSink1-4.request-timeout = 60000
agent.sinks.AvroSink1-4.batch-size = 1000
agent.sinks.AvroSink1-4.compression-type=deflate
agent.sinks.AvroSink1-4.compression-level=9

Thanks
Anat