Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Architecting Flume for failover


Copy link to this message
-
Re: Architecting Flume for failover
Noel,

What test did you perform?
Did you stop sink-2?
Currently you have set a higher priority for sink-2 so it will be the
default sink so long as it is up and running.

-Jeff

http://flume.apache.org/FlumeUserGuide.html#failover-sink-processor
On Tue, Feb 19, 2013 at 5:03 PM, Noel Duffy <[EMAIL PROTECTED]>wrote:

> The ip addresses 10.20.30.81 and 10.20.30.119 are the addresses of the
> Flume agents. The first agent is on 10.20.30.81, the second on
> 10.20.30.119. The idea was to have two sinks on different hosts and to
> configure Flume to failover to the second host if the first host should
> disappear. Although the documentation does not say so explicitly, I have
> read posts online which say that such a configuration is possible. I am
> running Flume-ng 1.2.0.
>
> It may be that I am approaching this problem in the wrong way. We need to
> have Flume reading events from RabbitMQ and writing them to HDFS. We want
> to have two different hosts running Flume so that if one dies for any
> reason, the other would take over and no events should be lost or delayed.
> Later we may have more Flume hosts, depending on how well they cope with
> the expected traffic, but for now two will suffice to prove the concept. A
> load-balancing sink processor sounds like it might also be a solution, but
> again, I do not see how to configure this to work across more than one host.
>
>
> From: Hari Shreedharan [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, 20 February 2013 1:31 p.m.
> To: [EMAIL PROTECTED]
> Subject: Re: Architecting Flume for failover
>
> Can you change the hdfs.path to hdfs://10.20.30.81/flume/localbrain-eventsand hdfs://
> 10.20.30.119/flume/localbrain-events on hdfsSink-1 and hdfsSink-2
> respectively (assuming those are your namenodes)? The "bind" configuration
> param does not really exist for HDFS Sink (it is only for the IPC sources).
>
>
> Thanks
> Hari
>
> --
> Hari Shreedharan
>
> On Tuesday, February 19, 2013 at 4:05 PM, Noel Duffy wrote:
> If I disable the agent.sinks line, both my sinks are disabled and nothing
> gets written to HDFS. The status page no longer shows me any sinks.
>
> From: Yogi Nerella [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, 20 February 2013 12:40 p.m.
> To: [EMAIL PROTECTED]
> Subject: Re: Architecting Flume for failover
>
> Hi Noel,
>
> May be you are specifying  both sinkgroups and sinks.
>
> Can you try removing the sinks.
> #agent.sinks = hdfsSink-1 hdfsSink-2
>
> Yogi
>
>
> On Tue, Feb 19, 2013 at 1:32 PM, Noel Duffy <[EMAIL PROTECTED]>
> wrote:
> I have a Flume agent that pulls events from RabbitMQ and pushes them into
> HDFS. So far so good, but now I want to have a second Flume agent on a
> different host acting as a hot backup for the first agent such that the
> loss of the first host running Flume would not cause any events to be lost.
> In the testing I've done I've gotten two Flume agents on separate hosts to
> read the same events from the RabbitMQ queue, but it's not clear to me how
> to configure the sinks such that only one of the sinks actually does
> something and the other does nothing.
>
> From reading the documentation, I supposed that a sinkgroup configured for
> failover was what I needed, but the documentation examples only cover the
> case where the sinks in a failover group are all on the same agent on the
> same host. I've seen messages online which seem to say that sinks in a
> sinkgroup can be on different hosts, but I can find no clear explanation of
> how to configure such a sinkgroup. How would sinks on different hosts
> communicate with one another? Would the sinks in the sinkgroup have to use
> a JDBC channel? Would the sinks have to be non-terminal sinks, like Avro?
>
> In my testing I set up two agents on different hosts and configured a
> sinkgroup containing two sinks, both HDFS sinks.
>
> agent.sinkgroups = sinkgroup1
> agent.sinkgroups.sinkgroup1.sinks = hdfsSink-1 hdfsSink-2
> agent.sinkgroups.sinkgroup1.processor.priority.hdfsSink-1 = 5
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB