Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Avro sink to source is too slow


Copy link to this message
-
Re: Avro sink to source is too slow
Anat Rozenzon 2013-10-03, 10:43
Just a quick update, I found two issues that slowed down flume:
1. Using 3 file replicating channels on the avro source slowed down the
acceptance of flume events, it takes up to 5-10  times more than writing to
one channel. So I'm now trying to change the collector's configuration to 1
file channel and then a spooldir source that will read out of the
Collector's file system and into a memory channel for replication.
2. More disturbing is that I see many disconnections in Avro Sink-Source
pair while the Source flume (e.g. collector) is doing Full GCs, also the
Full GCs were quite long (~ 15 seconds). Changing Java to a non-hanging GC
(i.e. gc1) solved this issue as well.

BTW Regarding Mike's question above:
What is the correct way to put multiple threads that will drain a channel
quickly?
I thought the correct way is simply to put multiple sinks that are using
the same channel, without any sink groups, is that correct?

Thanks
Anat
On Tue, Oct 1, 2013 at 11:10 PM, Roshan Naik <[EMAIL PROTECTED]> wrote:

> My thoughts...You have 4 sinks draining the same channel and each has a
> batch size 1000. Since they will contend on the same channel & *assuming*
> events are evenly distributed among the sinks, there is potential for some
> starvation happening in the sinks as their batch sizes may not be reached
> until about 4 batches  are inserted by the source. I dont know if there is
> a good thumb rule here.
>
> try these:
> -  See if sink batch size of 250 helps.
> -  Using a single avro sink instead of 4 with batch size of 1k.
> -  Replacing the  avro sink with the null sink on the first agent and take
> a measurement. it would be good to ensure spool source is not the bottle
> neck.
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>