Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Reliability in Flume


Copy link to this message
-
Reliability in Flume
Dear Flume developers and users,

I understand that Flume NG uses channel-based transactions to guarantee
reliable message delivery between agents. But in some extreme failure
scenes, will Flume keep total Reliability? I have thought of these scenes
below.

1. In transactions between agent, what will happen if the receiving agent
process down just after it commits its put transaction and before sends the
success indication to the sending agent? Will the sending agent send the
same event again when the receiving agent recovers, and cause data
duplication?

2. In the communication between the client (data source, sending data to
the first-hop agent) and the  first-hop agent, what will happen if the
agent process down just after it receives the event and before saves to its
channel? Will it cause data loss?

3. In the communication between the final-hup agent and the storage system
(such as MySQL, HDFS, file system, etc.), what happened if the agent down
before it commits the saving transaction but has saved some data in the
storage? Will this cause data duplication after the recover of the agent?

Thank you very much!
--
Best Regards,
Henry Ma