Avro sink to source is too slow

What kind of disk configuration on your file channel?  With a single disk configuration (Dell Blade server) performance was awful.  I believe what Flume needs at a minimum is a separate disk for the check point and data directories.  When I switched to a SSD or a 13 disk raid setup my problems went away with one exception.   Compression was still very slow.  I ended up distributing my flow over several file channels to get good throughput with compression.

On 09/30/2013 11:11 AM, Anat Rozenzon wrote:

I'm trying to read 100MB of files using directory spooler, file channel and 4 avro sinks into an avro source running on another flume process.
Both flume processes are running on same machine just for eliminating network issues.

However it takes more than 5 minutes to read & pass the 100MB data, this is too slow for our needs.

After about 1 minute the files are read into the file channel and then quite a long time where the file channel is draining really slowly with the four sinks.

Copying the same data using scp from a remote machine takes 7 seconds.

Below is my config, anything I can do to improve this?

agent.sources = logsdir
agent.sources.logsdir.type = spooldir
agent.sources.logsdir.channels = fileChannel
agent.sources.logsdir.spoolDir = %%WORK_DIR%%
agent.sources.logsdir.fileHeader = true
agent.sources.logsdir.interceptors =  ihost iserver_type iserver_id
agent.sources.logsdir.interceptors.ihost.type = host
agent.sources.logsdir.interceptors.ihost.useIP = false
agent.sources.logsdir.interceptors.ihost.hostHeader = server_hostname

agent.sources.logsdir.interceptors.iserver_type.type = static
agent.sources.logsdir.interceptors.iserver_type.key = server_type
agent.sources.logsdir.interceptors.iserver_type.value = %%SERVER_TYPE%%
agent.sources.logsdir.interceptors.iserver_id.type = static
agent.sources.logsdir.interceptors.iserver_id.key = server_id
agent.sources.logsdir.interceptors.iserver_id.value = %%SERVER_ID%%

agent.sources.logsdir.deserializer.maxLineLength = 10240
agent.channels = fileChannel
agent.channels.fileChannel.type = file

## Send to  multiple Collectors for load balancing
agent.sinks = AvroSink1-1 AvroSink1-2 AvroSink1-3 AvroSink1-4

agent.sinks.AvroSink1-1.type = avro
agent.sinks.AvroSink1-1.channel = fileChannel
agent.sinks.AvroSink1-1.hostname = %%COLLECTOR1_SERVER%%
agent.sinks.AvroSink1-1.port = 4545%%COLLECTOR1_SLOT%%
agent.sinks.AvroSink1-1.connect-timeout = 60000
agent.sinks.AvroSink1-1.request-timeout = 60000
agent.sinks.AvroSink1-1.batch-size = 1000

agent.sinks.AvroSink1-2.type = avro
agent.sinks.AvroSink1-2.channel = fileChannel
agent.sinks.AvroSink1-2.hostname = %%COLLECTOR1_SERVER%%
agent.sinks.AvroSink1-2.port = 4545%%COLLECTOR1_SLOT%%
agent.sinks.AvroSink1-2.connect-timeout = 60000
agent.sinks.AvroSink1-2.request-timeout = 60000
agent.sinks.AvroSink1-2.batch-size = 1000

agent.sinks.AvroSink1-3.type = avro
agent.sinks.AvroSink1-3.channel = fileChannel
agent.sinks.AvroSink1-3.hostname = %%COLLECTOR1_SERVER%%
agent.sinks.AvroSink1-3.port = 4545%%COLLECTOR1_SLOT%%
agent.sinks.AvroSink1-3.connect-timeout = 60000
agent.sinks.AvroSink1-3.request-timeout = 60000
agent.sinks.AvroSink1-3.batch-size = 1000

agent.sinks.AvroSink1-4.type = avro
agent.sinks.AvroSink1-4.channel = fileChannel
agent.sinks.AvroSink1-4.hostname = %%COLLECTOR1_SERVER%%
agent.sinks.AvroSink1-4.port = 4545%%COLLECTOR1_SLOT%%
agent.sinks.AvroSink1-4.connect-timeout = 60000
agent.sinks.AvroSink1-4.request-timeout = 60000
agent.sinks.AvroSink1-4.batch-size = 1000


