Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Avro sink to source is too slow


Copy link to this message
-
Re: Avro sink to source is too slow
AFAIK we have a fast disk
However I think  the problem is with avro and not the channel as you can
see in the metrics below the channel got filled quickly but draining very
slowly.
After a few minutes of running only 70-80 batches were sent by each sink.
{
"SINK.AvroSink1-4":{"BatchCompleteCount":"74","ConnectionFailedCount":"0","EventDrainAttemptCount":"74000","ConnectionCreatedCount":"3","Type":"SINK","BatchEmptyCount":"1","ConnectionClosedCount":"2","EventDrainSuccessCount":"71000","StopTime":"0","StartTime":"1380568140738","BatchUnderflowCount":"0"},
"SOURCE.logsdir":{"OpenConnectionCount":"0","Type":"SOURCE","AppendBatchAcceptedCount":"1330","AppendBatchReceivedCount":"1330","EventAcceptedCount":"1326298","AppendReceivedCount":"0","StopTime":"0","StartTime":"1380568140830","EventReceivedCount":"1326298","AppendAcceptedCount":"0"},
"CHANNEL.fileChannel":{"EventPutSuccessCount":"1326298","ChannelFillPercentage":"51.314899999999994","Type":"CHANNEL","StopTime":"0","EventPutAttemptCount":"1326298","ChannelSize":"1026298","StartTime":"1380568140730","EventTakeSuccessCount":"300000","ChannelCapacity":"2000000","EventTakeAttemptCount":"310073"},
"SINK.AvroSink1-2":{"BatchCompleteCount":"78","ConnectionFailedCount":"0","EventDrainAttemptCount":"78000","ConnectionCreatedCount":"3","Type":"SINK","BatchEmptyCount":"1","ConnectionClosedCount":"2","EventDrainSuccessCount":"75000","StopTime":"0","StartTime":"1380568140736","BatchUnderflowCount":"0"},
"SINK.AvroSink1-3":{"BatchCompleteCount":"81","ConnectionFailedCount":"0","EventDrainAttemptCount":"81000","ConnectionCreatedCount":"3","Type":"SINK","BatchEmptyCount":"1","ConnectionClosedCount":"2","EventDrainSuccessCount":"79000","StopTime":"0","StartTime":"1380568140736","BatchUnderflowCount":"0"},
"SINK.AvroSink1-1":{"BatchCompleteCount":"77","ConnectionFailedCount":"0","EventDrainAttemptCount":"77000","ConnectionCreatedCount":"2","Type":"SINK","BatchEmptyCount":"1","ConnectionClosedCount":"1","EventDrainSuccessCount":"75000","StopTime":"0","StartTime":"1380568140734","BatchUnderflowCount":"0"}}
On Mon, Sep 30, 2013 at 7:21 PM, Mike Keane <[EMAIL PROTECTED]> wrote:

> What kind of disk configuration on your file channel?  With a single disk
> configuration (Dell Blade server) performance was awful.  I believe what
> Flume needs at a minimum is a separate disk for the check point and data
> directories.  When I switched to a SSD or a 13 disk raid setup my problems
> went away with one exception.   Compression was still very slow.  I ended
> up distributing my flow over several file channels to get good throughput
> with compression.
>
> -Mike
>
>
> On 09/30/2013 11:11 AM, Anat Rozenzon wrote:
> Hi
>
> I'm trying to read 100MB of files using directory spooler, file channel
> and 4 avro sinks into an avro source running on another flume process.
> Both flume processes are running on same machine just for eliminating
> network issues.
>
> However it takes more than 5 minutes to read & pass the 100MB data, this
> is too slow for our needs.
>
> After about 1 minute the files are read into the file channel and then
> quite a long time where the file channel is draining really slowly with the
> four sinks.
>
> Copying the same data using scp from a remote machine takes 7 seconds.
>
> Below is my config, anything I can do to improve this?
>
> agent.sources = logsdir
> agent.sources.logsdir.type = spooldir
> agent.sources.logsdir.channels = fileChannel
> agent.sources.logsdir.spoolDir = %%WORK_DIR%%
> agent.sources.logsdir.fileHeader = true
> agent.sources.logsdir.batchSize=1000
> agent.sources.logsdir.deletePolicy=immediate
> agent.sources.logsdir.interceptors =  ihost iserver_type iserver_id
> agent.sources.logsdir.interceptors.ihost.type = host
> agent.sources.logsdir.interceptors.ihost.useIP = false
> agent.sources.logsdir.interceptors.ihost.hostHeader = server_hostname
>
> agent.sources.logsdir.interceptors.iserver_type.type = static
> agent.sources.logsdir.interceptors.iserver_type.key = server_type