Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Avro sink to source is too slow


Copy link to this message
-
Re: Avro sink to source is too slow
Hari Shreedharan 2013-10-03, 15:28
Yes. Using multiple sinks with no sink groups would give each sin nuts own
thread. Each time you add a channel to a source you will take some
performance hit, because the channels are written to one after the
other.also were these channels sharing disks? Was the checkpoint and data
files for each of them on separate disks?

On Thursday, October 3, 2013, Anat Rozenzon wrote:

> Just a quick update, I found two issues that slowed down flume:
> 1. Using 3 file replicating channels on the avro source slowed down the
> acceptance of flume events, it takes up to 5-10  times more than writing to
> one channel. So I'm now trying to change the collector's configuration to 1
> file channel and then a spooldir source that will read out of the
> Collector's file system and into a memory channel for replication.
> 2. More disturbing is that I see many disconnections in Avro Sink-Source
> pair while the Source flume (e.g. collector) is doing Full GCs, also the
> Full GCs were quite long (~ 15 seconds). Changing Java to a non-hanging GC
> (i.e. gc1) solved this issue as well.
>
> BTW Regarding Mike's question above:
> What is the correct way to put multiple threads that will drain a channel
> quickly?
> I thought the correct way is simply to put multiple sinks that are using
> the same channel, without any sink groups, is that correct?
>
> Thanks
> Anat
>
>
> On Tue, Oct 1, 2013 at 11:10 PM, Roshan Naik <[EMAIL PROTECTED]<javascript:_e({}, 'cvml', '[EMAIL PROTECTED]');>
> > wrote:
>
>> My thoughts...You have 4 sinks draining the same channel and each has a
>> batch size 1000. Since they will contend on the same channel & *assuming*
>> events are evenly distributed among the sinks, there is potential for some
>> starvation happening in the sinks as their batch sizes may not be reached
>> until about 4 batches  are inserted by the source. I dont know if there is
>> a good thumb rule here.
>>
>> try these:
>> -  See if sink batch size of 250 helps.
>> -  Using a single avro sink instead of 4 with batch size of 1k.
>> -  Replacing the  avro sink with the null sink on the first agent and
>> take a measurement. it would be good to ensure spool source is not the
>> bottle neck.
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>