Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Architecting Flume for failover

Copy link to this message
Re: Architecting Flume for failover
Yogi Nerella 2013-02-19, 23:39
Hi Noel,

May be you are specifying  both sinkgroups and sinks.

Can you try removing the sinks.
#agent.sinks = hdfsSink-1 hdfsSink-2


On Tue, Feb 19, 2013 at 1:32 PM, Noel Duffy <[EMAIL PROTECTED]>wrote:

> I have a Flume agent that pulls events from RabbitMQ and pushes them into
> HDFS. So far so good, but now I want to have a second Flume agent on a
> different host acting as a hot backup for the first agent such that the
> loss of the first host running Flume would not cause any events to be lost.
> In the testing I've done I've gotten two Flume agents on separate hosts to
> read the same events from the RabbitMQ queue, but it's not clear to me how
> to configure the sinks such that only one of the sinks actually does
> something and the other does nothing.
> From reading the documentation, I supposed that a sinkgroup configured for
> failover was what I needed, but the documentation examples only cover the
> case where the sinks in a failover group are all on the same agent on the
> same host. I've seen messages online which seem to say that sinks in a
> sinkgroup can be on different hosts, but I can find no clear explanation of
> how to configure such a sinkgroup. How would sinks on different hosts
> communicate with one another? Would the sinks in the sinkgroup have to use
> a JDBC channel? Would the sinks have to be non-terminal sinks, like Avro?
> In my testing I set up two agents on different hosts and configured a
> sinkgroup containing two sinks, both HDFS sinks.
> agent.sinkgroups = sinkgroup1
> agent.sinkgroups.sinkgroup1.sinks = hdfsSink-1 hdfsSink-2
> agent.sinkgroups.sinkgroup1.processor.priority.hdfsSink-1 = 5
> agent.sinkgroups.sinkgroup1.processor.priority.hdfsSink-2 = 10
> agent.sinkgroups.sinkgroup1.processor.type=failover
> agent.sinks = hdfsSink-1 hdfsSink-2
> agent.sinks.hdfsSink-1.type = hdfs
> agent.sinks.hdfsSink-1.bind =
> agent.sinks.hdfsSink-1.channel = fileChannel-1
> agent.sinks.hdfsSink-1.hdfs.path = /flume/localbrain-events
> agent.sinks.hdfsSink-1.hdfs.filePrefix = lb-events
> agent.sinks.hdfsSink-1.hdfs.round = false
> agent.sinks.hdfsSink-1.hdfs.rollCount=50
> agent.sinks.hdfsSink-1.hdfs.fileType=SequenceFile
> agent.sinks.hdfsSink-1.hdfs.writeFormat=Text
> agent.sinks.hdfsSink-1.hdfs.codeC = lzo
> agent.sinks.hdfsSink-1.hdfs.rollInterval=30
> agent.sinks.hdfsSink-1.hdfs.rollSize=0
> agent.sinks.hdfsSink-1.hdfs.batchSize=1
> agent.sinks.hdfsSink-2.bind =
> agent.sinks.hdfsSink-2.type = hdfs
> agent.sinks.hdfsSink-2.channel = fileChannel-1
> agent.sinks.hdfsSink-2.hdfs.path = /flume/localbrain-events
> agent.sinks.hdfsSink-2.hdfs.filePrefix = lb-events
> agent.sinks.hdfsSink-2.hdfs.round = false
> agent.sinks.hdfsSink-2.hdfs.rollCount=50
> agent.sinks.hdfsSink-2.hdfs.fileType=SequenceFile
> agent.sinks.hdfsSink-2.hdfs.writeFormat=Text
> agent.sinks.hdfsSink-2.hdfs.codeC = lzo
> agent.sinks.hdfsSink-2.hdfs.rollInterval=30
> agent.sinks.hdfsSink-2.hdfs.rollSize=0
> agent.sinks.hdfsSink-2.hdfs.batchSize=1
> However, this does not achieve the failover I hoped for. The sink
> hdfsSink-2 on both agents writes the events to HDFS. The agents are not
> communicating, so the binding of the sink to an ip address is not doing
> anything.