|
|
Botond Hejj 2011-12-12, 17:06
Hi ZooKeeper users,
I am playing currently with zookeeper and testing what happens if the leader of an ensemble goes down. I know that during the leader election zookeeper server won't reply to any requests and if leader election takes a long time than existing sessions might expire. What I see in my tests that each server reads the last snapshot file to get last zxid for leader election and when the leader is elected than the leader reads the snapshot again before it syncs the followers.
This means that the more data we store in zookeeper the longer it takes to elect a new leader. This is also means as load of the ensemble increases clients need bigger session timeout to "survive" the loss of the leader.
Is it possible to do anything about this and have a fast leader election even if the snapshot is big?
Regards, Botond Hejj
-
Re: leader election length
Camille Fournier 2011-12-12, 17:35
Existing sessions will not expire from the server side during election. Your client code may choose to close them on its end if you sit in a DISCONNECTED state for too long, but nothing should be expiring the sessions while quorum is not available.
C
On Mon, Dec 12, 2011 at 12:06 PM, Botond Hejj <[EMAIL PROTECTED] > wrote:
> Hi ZooKeeper users, > > I am playing currently with zookeeper and testing what happens if the > leader of an ensemble goes down. > I know that during the leader election zookeeper server won't reply to > any requests and if leader election takes a long time than existing > sessions might expire. > What I see in my tests that each server reads the last snapshot file > to get last zxid for leader election and when the leader is elected > than the leader reads the snapshot again before it syncs the > followers. > > This means that the more data we store in zookeeper the longer it > takes to elect a new leader. This is also means as load of the > ensemble increases clients need bigger session timeout to "survive" > the loss of the leader. > > Is it possible to do anything about this and have a fast leader > election even if the snapshot is big? > > Regards, > Botond Hejj >
-
Re: leader election length
Botond Hejj 2011-12-13, 10:55
Thanks Camille,
Sorry, my assumption was wrong. I've made now a test and indeed the session doesn't expire in this case. This means than the only problem is that the service is down for an increasing amount of time if snapshot is increasing. We use SAN to store the snapshot and reading it back twice during leader election can be a little slow (~30 secs). It is not bad at all but still have anybody tried making the leader election faster in this case?
Regards, Botond
On Mon, Dec 12, 2011 at 18:35, Camille Fournier <[EMAIL PROTECTED]> wrote: > Existing sessions will not expire from the server side during election. > Your client code may choose to close them on its end if you sit in a > DISCONNECTED state for too long, but nothing should be expiring the > sessions while quorum is not available. > > C > > On Mon, Dec 12, 2011 at 12:06 PM, Botond Hejj <[EMAIL PROTECTED] >> wrote: > >> Hi ZooKeeper users, >> >> I am playing currently with zookeeper and testing what happens if the >> leader of an ensemble goes down. >> I know that during the leader election zookeeper server won't reply to >> any requests and if leader election takes a long time than existing >> sessions might expire. >> What I see in my tests that each server reads the last snapshot file >> to get last zxid for leader election and when the leader is elected >> than the leader reads the snapshot again before it syncs the >> followers. >> >> This means that the more data we store in zookeeper the longer it >> takes to elect a new leader. This is also means as load of the >> ensemble increases clients need bigger session timeout to "survive" >> the loss of the leader. >> >> Is it possible to do anything about this and have a fast leader >> election even if the snapshot is big? >> >> Regards, >> Botond Hejj >>
-- Botond Hejj Morgan Stanley | Technology Lechner Odon fasor 8 | Floor 07 Budapest, 1095 Phone: +36 1 881-3962 [EMAIL PROTECTED]
-
Re: leader election length
Flavio Junqueira 2011-12-13, 13:02
Hi Botond, I'm under the impression that the leader only loads the database twice when it is bootstrapping. In QuorumPeer.start(), we call loadDataBase(). The second time corresponds to the call to zk.loadData in Leader.lead(). There could be a third time if zkDb.isInitialized() is false thought the call path lookForLeader()-> getInitLastLoggedZxid() -> getLastLoggedZxid(). I don't see how it could be false, though.
After we get into the main QuorumPeer loop (in run()), we only call loadDataBase() through lookForLeader() upon calling getInitLastLoggedZxid() -> getLastLoggedZxid(). loadDataBase() is only executed if zkDb.isInitialized() is false, and I actually don't see a case in which it would be true once for the call coming from lookForLeader().
The part I can't remember is why we need to call loadData() in Leader.lead() in the case that the data tree has been initialized already. If we are somehow skipping it when it has been initialized already, then I missed it. I'll keep looking into it...
-Flavio
On Dec 13, 2011, at 11:55 AM, Botond Hejj wrote:
> Thanks Camille, > > Sorry, my assumption was wrong. I've made now a test and indeed the > session doesn't expire in this case. > This means than the only problem is that the service is down for an > increasing amount of time if snapshot is increasing. We use SAN to > store the snapshot and reading it back twice during leader election > can be a little slow (~30 secs). It is not bad at all but still have > anybody tried making the leader election faster in this case? > > Regards, > Botond > > On Mon, Dec 12, 2011 at 18:35, Camille Fournier <[EMAIL PROTECTED]> > wrote: >> Existing sessions will not expire from the server side during >> election. >> Your client code may choose to close them on its end if you sit in a >> DISCONNECTED state for too long, but nothing should be expiring the >> sessions while quorum is not available. >> >> C >> >> On Mon, Dec 12, 2011 at 12:06 PM, Botond Hejj <[EMAIL PROTECTED] >>> wrote: >> >>> Hi ZooKeeper users, >>> >>> I am playing currently with zookeeper and testing what happens if >>> the >>> leader of an ensemble goes down. >>> I know that during the leader election zookeeper server won't >>> reply to >>> any requests and if leader election takes a long time than existing >>> sessions might expire. >>> What I see in my tests that each server reads the last snapshot file >>> to get last zxid for leader election and when the leader is elected >>> than the leader reads the snapshot again before it syncs the >>> followers. >>> >>> This means that the more data we store in zookeeper the longer it >>> takes to elect a new leader. This is also means as load of the >>> ensemble increases clients need bigger session timeout to "survive" >>> the loss of the leader. >>> >>> Is it possible to do anything about this and have a fast leader >>> election even if the snapshot is big? >>> >>> Regards, >>> Botond Hejj >>> > > > > -- > Botond Hejj > Morgan Stanley | Technology > Lechner Odon fasor 8 | Floor 07 > Budapest, 1095 > Phone: +36 1 881-3962 > [EMAIL PROTECTED]
flavio junqueira
research scientist
[EMAIL PROTECTED] direct +34 93-183-8828
avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300 fax (408) 349 3301
-
Re: leader election length
Botond Hejj 2011-12-14, 14:44
Thanks Flavio for looking into this. If the loadData can be removed from lead() that could be improve leaderElection a lot on slow disk with high amount of data.
Botond
On Tue, Dec 13, 2011 at 14:02, Flavio Junqueira <[EMAIL PROTECTED]> wrote: > Hi Botond, I'm under the impression that the leader only loads the database > twice when it is bootstrapping. In QuorumPeer.start(), we call > loadDataBase(). The second time corresponds to the call to zk.loadData in > Leader.lead(). There could be a third time if zkDb.isInitialized() is false > thought the call path lookForLeader()-> getInitLastLoggedZxid() -> > getLastLoggedZxid(). I don't see how it could be false, though. > > After we get into the main QuorumPeer loop (in run()), we only call > loadDataBase() through lookForLeader() upon calling getInitLastLoggedZxid() > -> getLastLoggedZxid(). loadDataBase() is only executed if > zkDb.isInitialized() is false, and I actually don't see a case in which it > would be true once for the call coming from lookForLeader(). > > The part I can't remember is why we need to call loadData() in Leader.lead() > in the case that the data tree has been initialized already. If we are > somehow skipping it when it has been initialized already, then I missed it. > I'll keep looking into it... > > -Flavio > > > On Dec 13, 2011, at 11:55 AM, Botond Hejj wrote: > >> Thanks Camille, >> >> Sorry, my assumption was wrong. I've made now a test and indeed the >> session doesn't expire in this case. >> This means than the only problem is that the service is down for an >> increasing amount of time if snapshot is increasing. We use SAN to >> store the snapshot and reading it back twice during leader election >> can be a little slow (~30 secs). It is not bad at all but still have >> anybody tried making the leader election faster in this case? >> >> Regards, >> Botond >> >> On Mon, Dec 12, 2011 at 18:35, Camille Fournier <[EMAIL PROTECTED]> >> wrote: >>> >>> Existing sessions will not expire from the server side during election. >>> Your client code may choose to close them on its end if you sit in a >>> DISCONNECTED state for too long, but nothing should be expiring the >>> sessions while quorum is not available. >>> >>> C >>> >>> On Mon, Dec 12, 2011 at 12:06 PM, Botond Hejj >>> <[EMAIL PROTECTED] >>>> >>>> wrote: >>> >>> >>>> Hi ZooKeeper users, >>>> >>>> I am playing currently with zookeeper and testing what happens if the >>>> leader of an ensemble goes down. >>>> I know that during the leader election zookeeper server won't reply to >>>> any requests and if leader election takes a long time than existing >>>> sessions might expire. >>>> What I see in my tests that each server reads the last snapshot file >>>> to get last zxid for leader election and when the leader is elected >>>> than the leader reads the snapshot again before it syncs the >>>> followers. >>>> >>>> This means that the more data we store in zookeeper the longer it >>>> takes to elect a new leader. This is also means as load of the >>>> ensemble increases clients need bigger session timeout to "survive" >>>> the loss of the leader. >>>> >>>> Is it possible to do anything about this and have a fast leader >>>> election even if the snapshot is big? >>>> >>>> Regards, >>>> Botond Hejj >>>> >> >> >> >> -- >> Botond Hejj >> Morgan Stanley | Technology >> Lechner Odon fasor 8 | Floor 07 >> Budapest, 1095 >> Phone: +36 1 881-3962 >> [EMAIL PROTECTED] > > > flavio > junqueira > > research scientist > > [EMAIL PROTECTED] > direct +34 93-183-8828 > > avinguda diagonal 177, 8th floor, barcelona, 08018, es > phone (408) 349 3300 fax (408) 349 3301 >
-- Botond Hejj Morgan Stanley | Technology Lechner Odon fasor 8 | Floor 07 Budapest, 1095 Phone: +36 1 881-3962 [EMAIL PROTECTED]
|
|