Perhaps, it's a matter of semantics.
But, I think I'm not talking only about failure, but normal operation.
It's normal to take a cluster down for maintenance, or code update. And
this should be done a rolling restart manner (1 server at a time).
The reason for replication, is to increase reliability (as well as
availability). We've said that in 0.8.0, we can tolerate R-1 node failures
(where R is the replication factor). From the client's standpoint, this
should mean that the cluster is still available and reliable when a single
node is down (assuming R is > 1).
When you say "at least once", you are suggesting that the message will be
delivered at least once, and won't be lost.
I don't think that was ever really true about the previous version <= 0.7
(and I don't think I ever really read that about 0.7, as a guarantee, did
On Sat, Oct 26, 2013 at 8:52 PM, Guozhang Wang <[EMAIL PROTECTED]> wrote: