Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Avro sink to source is too slow


Copy link to this message
-
Re: Avro sink to source is too slow
AFAIK we have a fast disk
However I think  the problem is with avro and not the channel as you can
see in the metrics below the channel got filled quickly but draining very
slowly.
After a few minutes of running only 70-80 batches were sent by each sink.
{
"SINK.AvroSink1-4":{"BatchCompleteCount":"74","ConnectionFailedCount":"0","EventDrainAttemptCount":"74000","ConnectionCreatedCount":"3","Type":"SINK","BatchEmptyCount":"1","ConnectionClosedCount":"2","EventDrainSuccessCount":"71000","StopTime":"0","StartTime":"1380568140738","BatchUnderflowCount":"0"},
"SOURCE.logsdir":{"OpenConnectionCount":"0","Type":"SOURCE","AppendBatchAcceptedCount":"1330","AppendBatchReceivedCount":"1330","EventAcceptedCount":"1326298","AppendReceivedCount":"0","StopTime":"0","StartTime":"1380568140830","EventReceivedCount":"1326298","AppendAcceptedCount":"0"},
"CHANNEL.fileChannel":{"EventPutSuccessCount":"1326298","ChannelFillPercentage":"51.314899999999994","Type":"CHANNEL","StopTime":"0","EventPutAttemptCount":"1326298","ChannelSize":"1026298","StartTime":"1380568140730","EventTakeSuccessCount":"300000","ChannelCapacity":"2000000","EventTakeAttemptCount":"310073"},
"SINK.AvroSink1-2":{"BatchCompleteCount":"78","ConnectionFailedCount":"0","EventDrainAttemptCount":"78000","ConnectionCreatedCount":"3","Type":"SINK","BatchEmptyCount":"1","ConnectionClosedCount":"2","EventDrainSuccessCount":"75000","StopTime":"0","StartTime":"1380568140736","BatchUnderflowCount":"0"},
"SINK.AvroSink1-3":{"BatchCompleteCount":"81","ConnectionFailedCount":"0","EventDrainAttemptCount":"81000","ConnectionCreatedCount":"3","Type":"SINK","BatchEmptyCount":"1","ConnectionClosedCount":"2","EventDrainSuccessCount":"79000","StopTime":"0","StartTime":"1380568140736","BatchUnderflowCount":"0"},
"SINK.AvroSink1-1":{"BatchCompleteCount":"77","ConnectionFailedCount":"0","EventDrainAttemptCount":"77000","ConnectionCreatedCount":"2","Type":"SINK","BatchEmptyCount":"1","ConnectionClosedCount":"1","EventDrainSuccessCount":"75000","StopTime":"0","StartTime":"1380568140734","BatchUnderflowCount":"0"}}
On Mon, Sep 30, 2013 at 7:21 PM, Mike Keane <[EMAIL PROTECTED]> wrote:

> What kind of disk configuration on your file channel?  With a single disk
> configuration (Dell Blade server) performance was awful.  I believe what
> Flume needs at a minimum is a separate disk for the check point and data
> directories.  When I switched to a SSD or a 13 disk raid setup my problems
> went away with one exception.   Compression was still very slow.  I ended
> up distributing my flow over several file channels to get good throughput
> with compression.
>
> -Mike
>
>
> On 09/30/2013 11:11 AM, Anat Rozenzon wrote:
> Hi
>
> I'm trying to read 100MB of files using directory spooler, file channel
> and 4 avro sinks into an avro source running on another flume process.
> Both flume processes are running on same machine just for eliminating
> network issues.
>
> However it takes more than 5 minutes to read & pass the 100MB data, this
> is too slow for our needs.
>
> After about 1 minute the files are read into the file channel and then
> quite a long time where the file channel is draining really slowly with the
> four sinks.
>
> Copying the same data using scp from a remote machine takes 7 seconds.
>
> Below is my config, anything I can do to improve this?
>
> agent.sources = logsdir
> agent.sources.logsdir.type = spooldir
> agent.sources.logsdir.channels = fileChannel
> agent.sources.logsdir.spoolDir = %%WORK_DIR%%
> agent.sources.logsdir.fileHeader = true
> agent.sources.logsdir.batchSize=1000
> agent.sources.logsdir.deletePolicy=immediate
> agent.sources.logsdir.interceptors =  ihost iserver_type iserver_id
> agent.sources.logsdir.interceptors.ihost.type = host
> agent.sources.logsdir.interceptors.ihost.useIP = false
> agent.sources.logsdir.interceptors.ihost.hostHeader = server_hostname
>
> agent.sources.logsdir.interceptors.iserver_type.type = static
> agent.sources.logsdir.interceptors.iserver_type.key = server_type
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB