Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Architecting Flume for failover

Noel Duffy 2013-02-19, 21:32
Yogi Nerella 2013-02-19, 23:39
Noel Duffy 2013-02-20, 00:05
Hari Shreedharan 2013-02-20, 00:30
Noel Duffy 2013-02-20, 01:03
Jeff Lord 2013-02-20, 01:16
Noel Duffy 2013-02-20, 01:35
Jeff Lord 2013-02-20, 01:45
Copy link to this message
RE: Architecting Flume for failover
The configuration file in its entirety:

agent.sources = rabbitmq-source1

agent.sinkgroups = sinkgroup1
agent.sinkgroups.sinkgroup1.sinks = hdfsSink-1 hdfsSink-2
agent.sinkgroups.sinkgroup1.processor.priority.hdfsSink-1 = 5
agent.sinkgroups.sinkgroup1.processor.priority.hdfsSink-2 = 10

agent.channels = fileChannel-1
agent.channels.fileChannel-1.type = file
agent.channels.fileChannel-1.checkpointDir = /var/flume/checkpoint
agent.channels.fileChannel-1.dataDirs = /var/flume/data

agent.sinks = hdfsSink-1 hdfsSink-2
agent.sinks.hdfsSink-1.type = hdfs
agent.sinks.hdfsSink-1.bind =
agent.sinks.hdfsSink-1.channel = fileChannel-1
agent.sinks.hdfsSink-1.hdfs.path = /flume/localbrain-events
agent.sinks.hdfsSink-1.hdfs.filePrefix = lb-events
agent.sinks.hdfsSink-1.hdfs.round = false
agent.sinks.hdfsSink-1.hdfs.codeC = lzo

agent.sinks.hdfsSink-2.bind =
agent.sinks.hdfsSink-2.type = hdfs
agent.sinks.hdfsSink-2.channel = fileChannel-1
agent.sinks.hdfsSink-2.hdfs.path = /flume/localbrain-events
agent.sinks.hdfsSink-2.hdfs.filePrefix = lb-events
agent.sinks.hdfsSink-2.hdfs.round = false
agent.sinks.hdfsSink-2.hdfs.codeC = lzo

agent.sources.rabbitmq-source1.channels = fileChannel-1
agent.sources.rabbitmq-source1.type = org.apache.flume.source.rabbitmq.RabbitMQSource
agent.sources.rabbitmq-source1.hostname =
agent.sources.rabbitmq-source1.exchangename = rtr.topic.logs
agent.sources.rabbitmq-source1.username = ***
agent.sources.rabbitmq-source1.password = ***
agent.sources.rabbitmq-source1.port = 5672
agent.sources.rabbitmq-source1.virtualhost = rtr_prod
From: Jeff Lord [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, 20 February 2013 2:46 p.m.
Subject: Re: Architecting Flume for failover

Maybe post your entire config?

On Tue, Feb 19, 2013 at 5:35 PM, Noel Duffy <[EMAIL PROTECTED]> wrote:
In my tests I set up two Flume agents connected to two different HDFS clusters. The configuration of both Flume agents is identical. They read events from the same RabbitMQ server. In my test, both agent hosts wrote the event to their respective HDFS servers using hdfsSink-2, but I expected the failover sinkgroup configuration would mean only one host would write the event. In other words, I thought that a failover sinkgroup could be configured to have sinks on different hosts but that only one sink on one host would actually write the event and that the other host would not do anything.

All the examples in the documentation have all sinks in a sinkgroup on a single host. I want to have the sinks on different hosts. I've seen a number of assertions online that this can be done, but so far, I've not seen any examples of how to actually configure it.

From: Jeff Lord [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, 20 February 2013 2:17 p.m.
Subject: Re: Architecting Flume for failover


What test did you perform?
Did you stop sink-2? 
Currently you have set a higher priority for sink-2 so it will be the default sink so long as it is up and running.



On Tue, Feb 19, 2013 at 5:03 PM, Noel Duffy <[EMAIL PROTECTED]> wrote:
The ip addresses and are the addresses of the Flume agents. The first agent is on, the second on The idea was to have two sinks on different hosts and to configure Flume to failover to the second host if the first host should disappear. Although the documentation does not say so explicitly, I have read posts online which say that such a configuration is possible. I am running Flume-ng 1.2.0.

It may be that I am approaching this problem in the wrong way. We need to have Flume reading events from RabbitMQ and writing them to HDFS. We want to have two different hosts running Flume so that if one dies for any reason, the other would take over and no events should be lost or delayed. Later we may have more Flume hosts, depending on how well they cope with the expected traffic, but for now two will suffice to prove the concept. A load-balancing sink processor sounds like it might also be a solution, but again, I do not see how to configure this to work across more than one host.
From: Hari Shreedharan [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, 20 February 2013 1:31 p.m.
Subject: Re: Architecting Flume for failover

Can you change the hdfs.path to hdfs:// and hdfs:// on hdfsSink-1 and hdfsSink-2 respectively (assuming those are your namenodes)? The "bind" configuration param does not really exist for HDFS Sink (it is only for the IPC sources). 

Hari Shreedharan

On Tuesday, February 19, 2013 at 4:05 PM, Noel Duffy wrote:
If I disable the agent.sinks line, both my sinks are disabled and nothing gets written to HDFS. The status page no longer shows me any sinks.

From: Yogi Nerella [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, 20 February 2013 12:40 p.m.
Subject: Re: Architecting Flume for failover

Hi Noel,

May be you are specifying  both sinkgro
Noel Duffy 2013-02-20, 04:18
Hari Shreedharan 2013-02-20, 04:37
Hari Shreedharan 2013-02-20, 04:38
Noel Duffy 2013-02-20, 06:07
Juhani Connolly 2013-02-20, 06:26
Noel Duffy 2013-02-20, 07:27