Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Reliability in Flume

Copy link to this message
Re: Reliability in Flume
Please see inline...

On Wed, Jan 23, 2013 at 7:26 PM, Henry Ma <[EMAIL PROTECTED]> wrote:

> Dear Flume developers and users,
> I understand that Flume NG uses channel-based transactions to guarantee
> reliable message delivery between agents. But in some extreme failure
> scenes, will Flume keep total Reliability? I have thought of these scenes
> below.
> 1. In transactions between agent, what will happen if the receiving agent
> process down just after it commits its put transaction and before sends the
> success indication to the sending agent? Will the sending agent send the
> same event again when the receiving agent recovers, and cause data
> duplication?

Yes it will cause duplication in this case. But it's not that common if you
do proper capacity planning and tuning.

2. In the communication between the client (data source, sending data to
> the first-hop agent) and the  first-hop agent, what will happen if the
> agent process down just after it receives the event and before saves to its
> channel? Will it cause data loss?

It will not cause data loss because it saves to the channel before
acknowledging the transaction.

3. In the communication between the final-hup agent and the storage system
> (such as MySQL, HDFS, file system, etc.), what happened if the agent down
> before it commits the saving transaction but has saved some data in the
> storage? Will this cause data duplication after the recover of the agent?

Yes, this scenario can also cause duplicates.