Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Re: Avro sink to source is too slow

Mike Keane 2013-09-30, 16:21
Anat Rozenzon 2013-09-30, 19:29
Anat Rozenzon 2013-09-30, 16:10
Mike Keane 2013-09-30, 20:02
Copy link to this message
Re: Avro sink to source is too slow
   Can you give details on the second flume agent ?  for measuring, I
suggest you
- switch to mem channel  on both agents
- make your taget destination a separate disk (or diff host with fast n/w

it seems like there maybe too many components contending on the same disk
(spool source, file channels and sink on 2nd agent)


On Mon, Sep 30, 2013 at 1:02 PM, Mike Keane <[EMAIL PROTECTED]> wrote:

> As far as a fast disk if you only have one the drive head will be seeking
> constantly and performance will be awful we were having problems at 10,000
> log lines per second.  I've pushed over 270,000 lines per second compressed.
> I don't think it is avro, I'm able to saturate a gigabit line easily, so
> ~100mb / second of compressed data.
> I don't see a sink group in your configuration, I'm curious as to what the
> default behavior is when you tie multiple sinks to a file channel without a
> sink group.  That said I found performance issues using a single file
> channel with compression.  To get maximum performance I put a header on my
> events called "channel" since our servers are all numbered I was able to
> take (server# mod 6)+1 and make that the value for the "channel" header
> thus getting fairly even distribution of log data.  On my source I send
> data by channel header to the appropriate channel.  This parallelized the
> compression down 6 file channels.  I then have 3 sinks per channel using a
> failover sink group.   Also, do you need compression level 9?  I've found
> the gains in higher compression level are negligable compared to the
> performance expense (not with flume/deflate specifically but in general).
>  I found with turning compression level to 1 caused my sink to run 6-7
> times slower, my solution was to parallelize the compression and by trial
> and error found this to be the best case.
> agentName.sources.collector_source.selector.type = multiplexing
> agentName.sources.collector_source.selector.header = channel
> agentName.sources.collector_source.selector.mapping.1 = channel_1
> agentName.sources.collector_source.selector.mapping.2 = channel_2
> agentName.sources.collector_source.selector.mapping.3 = channel_3
> agentName.sources.collector_source.selector.mapping.4 = channel_4
> agentName.sources.collector_source.selector.mapping.5 = channel_5
> agentName.sources.collector_source.selector.default = channel_6
> -Mike
> On 09/30/2013 02:30 PM, Anat Rozenzon wrote:
> AFAIK we have a fast disk
> However I think  the problem is with avro and not the channel as you can
> see in the metrics below the channel got filled quickly but draining very
> slowly.
> After a few minutes of running only 70-80 batches were sent by each sink.
> {
> "SINK.AvroSink1-4":{"BatchCompleteCount":"74","ConnectionFailedCount":"0","EventDrainAttemptCount":"74000","ConnectionCreatedCount":"3","Type":"SINK","BatchEmptyCount":"1","ConnectionClosedCount":"2","EventDrainSuccessCount":"71000","StopTime":"0","StartTime":"1380568140738","BatchUnderflowCount":"0"},
> "SOURCE.logsdir":{"OpenConnectionCount":"0","Type":"SOURCE","AppendBatchAcceptedCount":"1330","AppendBatchReceivedCount":"1330","EventAcceptedCount":"1326298","AppendReceivedCount":"0","StopTime":"0","StartTime":"1380568140830","EventReceivedCount":"1326298","AppendAcceptedCount":"0"},
> "CHANNEL.fileChannel":{"EventPutSuccessCount":"1326298","ChannelFillPercentage":"51.314899999999994","Type":"CHANNEL","StopTime":"0","EventPutAttemptCount":"1326298","ChannelSize":"1026298","StartTime":"1380568140730","EventTakeSuccessCount":"300000","ChannelCapacity":"2000000","EventTakeAttemptCount":"310073"},
> "SINK.AvroSink1-2":{"BatchCompleteCount":"78","ConnectionFailedCount":"0","EventDrainAttemptCount":"78000","ConnectionCreatedCount":"3","Type":"SINK","BatchEmptyCount":"1","ConnectionClosedCount":"2","EventDrainSuccessCount":"75000","StopTime":"0","StartTime":"1380568140736","BatchUnderflowCount":"0"},
> "SINK.AvroSink1-3":{"BatchCompleteCount":"81","ConnectionFailedCount":"0","EventDrainAttemptCount":"81000","ConnectionCreatedCount":"3","Type":"SINK","BatchEmptyCount":"1","ConnectionClosedCount":"2","EventDrainSuccessCount":"79000","StopTime":"0","StartTime":"1380568140736","BatchUnderflowCount":"0"},

NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Anat Rozenzon 2013-10-01, 12:41
Roshan Naik 2013-10-01, 20:10
Anat Rozenzon 2013-10-03, 10:43
Hari Shreedharan 2013-10-03, 15:28
Anat Rozenzon 2013-10-03, 19:54