Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Avro sink to source is too slow

Copy link to this message
Re: Avro sink to source is too slow
AFAIK we have a fast disk
However I think  the problem is with avro and not the channel as you can
see in the metrics below the channel got filled quickly but draining very
After a few minutes of running only 70-80 batches were sent by each sink.
On Mon, Sep 30, 2013 at 7:21 PM, Mike Keane <[EMAIL PROTECTED]> wrote:

> What kind of disk configuration on your file channel?  With a single disk
> configuration (Dell Blade server) performance was awful.  I believe what
> Flume needs at a minimum is a separate disk for the check point and data
> directories.  When I switched to a SSD or a 13 disk raid setup my problems
> went away with one exception.   Compression was still very slow.  I ended
> up distributing my flow over several file channels to get good throughput
> with compression.
> -Mike
> On 09/30/2013 11:11 AM, Anat Rozenzon wrote:
> Hi
> I'm trying to read 100MB of files using directory spooler, file channel
> and 4 avro sinks into an avro source running on another flume process.
> Both flume processes are running on same machine just for eliminating
> network issues.
> However it takes more than 5 minutes to read & pass the 100MB data, this
> is too slow for our needs.
> After about 1 minute the files are read into the file channel and then
> quite a long time where the file channel is draining really slowly with the
> four sinks.
> Copying the same data using scp from a remote machine takes 7 seconds.
> Below is my config, anything I can do to improve this?
> agent.sources = logsdir
> agent.sources.logsdir.type = spooldir
> agent.sources.logsdir.channels = fileChannel
> agent.sources.logsdir.spoolDir = %%WORK_DIR%%
> agent.sources.logsdir.fileHeader = true
> agent.sources.logsdir.batchSize=1000
> agent.sources.logsdir.deletePolicy=immediate
> agent.sources.logsdir.interceptors =  ihost iserver_type iserver_id
> agent.sources.logsdir.interceptors.ihost.type = host
> agent.sources.logsdir.interceptors.ihost.useIP = false
> agent.sources.logsdir.interceptors.ihost.hostHeader = server_hostname
> agent.sources.logsdir.interceptors.iserver_type.type = static
> agent.sources.logsdir.interceptors.iserver_type.key = server_type