Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - Recovery time (was: Maximum size of a snapshot)


+
Flavio Junqueira 2013-07-17, 10:30
+
Flavio Junqueira 2013-07-17, 13:43
Copy link to this message
-
Re: Recovery time (was: Maximum size of a snapshot)
kishore g 2013-07-17, 16:23
On 1), load state from disk to find last zxid, does this mean it loads
snapshot or simply reads the tail of transaction log?.

On Wed, Jul 17, 2013 at 6:43 AM, Flavio Junqueira <[EMAIL PROTECTED]>wrote:

> I need to also mention ZOOKEEPER-1549 in the context of point (2) below.
> That's a blocker for 3.5.0.
>
> -Flavio
>
> On Jul 17, 2013, at 12:30 PM, Flavio Junqueira <[EMAIL PROTECTED]>
> wrote:
>
> > Moving the discussion to dev but keeping user on CC.
> >
> > Let's step back. The reason why we started the latest discussion in this
> thread was because Kishore is concerned about recovery time. There are a
> number of improvements we have been looking at for the next release, let me
> go over my current understanding of the main points that add to the
> recovery time:
> >
> > 1- Before we even start leader election, each server loads state from
> disk to determine its last zxid. The last zxid is used in the election;
> > 2- Once the leader is elected, it loads state from disk and take a
> snapshot. Loading the database again is unecessary (ZOOKEEPER-1642) and the
> snapshot adds latency. In fact, it is not even correct to have it there
> (ZOOKEEPER-1558).
> > 3- A follower takes a snapshot before acknowledging the NEWLEADER
> message, so the leader has to wait until a quorum of followers finishes
> their snapshot.
> >
> > The proposal I've heard here is to touch (1). For now, I'd rather keep
> (1) as is and focus on fixing (2). We might be able to do something about
> (3) and I'm actually not sure if there has been a discussion about it or
> not.
> >
> > -Flavio
> >
> > On Jul 17, 2013, at 5:40 AM, Thawan Kooburat <[EMAIL PROTECTED]> wrote:
> >
> >> Client will get session expire event only when a server explicitly tells
> >> the client. So any established sessions will remain in a disconnected
> >> state during the period
> >>
> >> So my comment about the need for longer session timeout might be
> >> incorrect. While the quorum is down during leader election, session
> won't
> >> expire during this period. When the quorum comes back, the client have
> to
> >> reconnect within session timeout in order to resume the session.
>  However,
> >> client won't be able to issue any read/write request or create a new
> >> session while the quorum is down.
> >>
> >> However, some application may need a stronger consistency guarantee.
> They
> >> will have a special logic to abort the client if it was disconnected for
> >> an extended period. This is because the client won't be able to tell if
> >> the quorum is down or there is a network partition between the client
> and
> >> the quorum.
> >>
> >>
> >> --
> >> Thawan Kooburat
> >>
> >>
> >>
> >>
> >>
> >> On 7/16/13 6:46 PM, "kishore g" <[EMAIL PROTECTED]> wrote:
> >>
> >>> Thanks Thawan. Another question to follow up, so lets say client c1 is
> >>> connected to leader and leader fails. Now c1 is trying to connect to
> >>> another zk server but all servers are busy loading snapshot and can
> take a
> >>> minute or two. According to Flavio zk servers dont accept any request
> >>> while
> >>> synchronization, but most clients dont keep that high connection
> timeout.
> >>> So does this mean clients will timeout on connection?. Is my
> understanding
> >>> correct or zk servers will accept connection requests but reject
> >>> read/write
> >>> requests.
> >>>
> >>> thanks,
> >>> Kishore G
> >>>
> >>>
> >>> On Tue, Jul 16, 2013 at 3:45 PM, Thawan Kooburat <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>>> There is a plan to work on this optimization ZOOKEEPER-1674.
> >>>>
> >>>>
> >>>> --
> >>>> Thawan Kooburat
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 7/16/13 1:37 PM, "kishore g" <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>>> All servers in the quorum reading the snapshot from disk as part of
> the
> >>>>> synchronization phase. From Thawan's email it looks like when ever
> >>>> there
> >>>>> is
> >>>>> a leader election, all zk servers read the snapshot from disk. I am
> not
> >>>>> sure why all servers should reload the snapshot from disk as this