|
|
-
Healthcheck using the stat command
Natarajan Suresh 2012-01-23, 21:45
I am trying to write a small health check script for the zookeeper instances. The "stat" command gives me an output like this: ===========================Zookeeper version: 3.3.3-1073969, built on 02/23/2011 22:27 GMTClients: /127.0.0.1:38929[0](queued=0,recved=1,sent=0) /17.155.7.152:37603[1](queued=0,recved=474752,sent=474752) Latency min/avg/max: 0/0/35Received: 1113675Sent: 1113706Outstanding: 0Zxid: 0x2000e2925Mode: followerNode count: 71==========================How do I know that the server is ok ? I do not have a bad instance with me to checkout what the output will be in that case. If anyone has already written a health check script, can you please share with me ? Thanks |Suresh|
+
Natarajan Suresh 2012-01-23, 21:45
-
Re: Healthcheck using the stat command
Philip Smith 2012-01-23, 21:51
There is a batch java program that does a health check:
validateZookeeperService.validateZookeeperService() which basically runs the ruok command. You could run the stat command and parse out the response but I think 99% of what you want could be simply looking for 'imok' in the response to the ruok command.
philip_smith@st11p00td-devlog001:~ 2 $ alias zkok alias zkok='for idx in 1 2 3 4 5 ; do export zkserver="st11p00td-zookeeper00${idx}" ; echo "$zkserver $( echo ruok | nc $zkserver 2181 )" ; done' philip_smith@st11p00td-devlog001:~ 3 $ zkok st11p00td-zookeeper001 imok st11p00td-zookeeper002 imok st11p00td-zookeeper003 imok st11p00td-zookeeper004 imok st11p00td-zookeeper005 imok philip_smith@st11p00td-devlog001:~ 4 $ On Jan 23, 2012, at 1:45 PM, Natarajan Suresh wrote:
> I am trying to write a small health check script for the zookeeper instances. The "stat" command gives me an output like this: > ===========================Zookeeper version: 3.3.3-1073969, built on 02/23/2011 22:27 GMTClients: /127.0.0.1:38929[0](queued=0,recved=1,sent=0) /17.155.7.152:37603[1](queued=0,recved=474752,sent=474752) > Latency min/avg/max: 0/0/35Received: 1113675Sent: 1113706Outstanding: 0Zxid: 0x2000e2925Mode: followerNode count: 71==========================> How do I know that the server is ok ? I do not have a bad instance with me to checkout what the output will be in that case. > If anyone has already written a health check script, can you please share with me ? > Thanks > |Suresh| Regards, Philip
Philip Smith Senior Software Engineer [EMAIL PROTECTED] 408 862-1360 office 530 574-1659 mobile
+
Philip Smith 2012-01-23, 21:51
-
Re: Healthcheck using the stat command
Jordan Zimmerman 2012-01-23, 22:16
The problem with 'ruok' is that it doesn't tell you the state of the Instance. 'ruok' might return 'imok' but the instance might not be serving due to some other error. Only a 'stat' will tell you that.
-JZ
On 1/23/12 1:51 PM, "Philip Smith" <[EMAIL PROTECTED]> wrote:
>There is a batch java program that does a health check: > >validateZookeeperService.validateZookeeperService() > > >which basically runs the ruok command. You could run the stat command and >parse out the response but I think 99% of what you want could be simply >looking for 'imok' in the response to the ruok command. > >philip_smith@st11p00td-devlog001:~ 2 $ alias zkok >alias zkok='for idx in 1 2 3 4 5 ; do export >zkserver="st11p00td-zookeeper00${idx}" ; echo "$zkserver $( echo ruok | >nc $zkserver 2181 )" ; done' >philip_smith@st11p00td-devlog001:~ 3 $ zkok >st11p00td-zookeeper001 imok >st11p00td-zookeeper002 imok >st11p00td-zookeeper003 imok >st11p00td-zookeeper004 imok >st11p00td-zookeeper005 imok >philip_smith@st11p00td-devlog001:~ 4 $ > > >On Jan 23, 2012, at 1:45 PM, Natarajan Suresh wrote: > >> I am trying to write a small health check script for the zookeeper >>instances. The "stat" command gives me an output like this: >> ===========================Zookeeper version: 3.3.3-1073969, built on >>02/23/2011 22:27 GMTClients: >>/127.0.0.1:38929[0](queued=0,recved=1,sent=0) >>/17.155.7.152:37603[1](queued=0,recved=474752,sent=474752) >> Latency min/avg/max: 0/0/35Received: 1113675Sent: 1113706Outstanding: >>0Zxid: 0x2000e2925Mode: followerNode count: 71==========================>> How do I know that the server is ok ? I do not have a bad instance with >>me to checkout what the output will be in that case. >> If anyone has already written a health check script, can you please >>share with me ? >> Thanks >> |Suresh| > > >Regards, Philip > >Philip Smith >Senior Software Engineer >[EMAIL PROTECTED] >408 862-1360 office >530 574-1659 mobile >
+
Jordan Zimmerman 2012-01-23, 22:16
-
Re: Healthcheck using the stat command
Jonathan Simms 2012-01-24, 04:24
On Mon, Jan 23, 2012 at 5:16 PM, Jordan Zimmerman <[EMAIL PROTECTED]> wrote: > The problem with 'ruok' is that it doesn't tell you the state of the > Instance. 'ruok' might return 'imok' but the instance might not be serving > due to some other error. Only a 'stat' will tell you that. > > -JZ >
Can you provide an example of what that would look like? A high Outstanding count? A Mode that's not "leader", "follower", or "observer"?
-J > On 1/23/12 1:51 PM, "Philip Smith" <[EMAIL PROTECTED]> wrote: > >>There is a batch java program that does a health check: >> >>validateZookeeperService.validateZookeeperService() >> >> >>which basically runs the ruok command. You could run the stat command and >>parse out the response but I think 99% of what you want could be simply >>looking for 'imok' in the response to the ruok command. >> >>philip_smith@st11p00td-devlog001:~ 2 $ alias zkok >>alias zkok='for idx in 1 2 3 4 5 ; do export >>zkserver="st11p00td-zookeeper00${idx}" ; echo "$zkserver $( echo ruok | >>nc $zkserver 2181 )" ; done' >>philip_smith@st11p00td-devlog001:~ 3 $ zkok >>st11p00td-zookeeper001 imok >>st11p00td-zookeeper002 imok >>st11p00td-zookeeper003 imok >>st11p00td-zookeeper004 imok >>st11p00td-zookeeper005 imok >>philip_smith@st11p00td-devlog001:~ 4 $ >> >> >>On Jan 23, 2012, at 1:45 PM, Natarajan Suresh wrote: >> >>> I am trying to write a small health check script for the zookeeper >>>instances. The "stat" command gives me an output like this: >>> ===========================Zookeeper version: 3.3.3-1073969, built on >>>02/23/2011 22:27 GMTClients: >>>/127.0.0.1:38929[0](queued=0,recved=1,sent=0) >>>/17.155.7.152:37603[1](queued=0,recved=474752,sent=474752) >>> Latency min/avg/max: 0/0/35Received: 1113675Sent: 1113706Outstanding: >>>0Zxid: 0x2000e2925Mode: followerNode count: 71==========================>>> How do I know that the server is ok ? I do not have a bad instance with >>>me to checkout what the output will be in that case. >>> If anyone has already written a health check script, can you please >>>share with me ? >>> Thanks >>> |Suresh| >> >> >>Regards, Philip >> >>Philip Smith >>Senior Software Engineer >>[EMAIL PROTECTED] >>408 862-1360 office >>530 574-1659 mobile >> >
+
Jonathan Simms 2012-01-24, 04:24
-
Re: Healthcheck using the stat command
Camille Fournier 2012-01-24, 04:29
If the node is "leader", "follower" or "observer" you are ok. That's all you should need to look for. ruok just checks to see if the port is responding and the code is running, pretty much. stat will actually look at the system and see if the zkserver has started. If so, you'll get some info, if not, you'll see "This ZooKeeper instance is not currently serving requests"
C
On Mon, Jan 23, 2012 at 11:24 PM, Jonathan Simms <[EMAIL PROTECTED]> wrote:
> On Mon, Jan 23, 2012 at 5:16 PM, Jordan Zimmerman > <[EMAIL PROTECTED]> wrote: > > The problem with 'ruok' is that it doesn't tell you the state of the > > Instance. 'ruok' might return 'imok' but the instance might not be > serving > > due to some other error. Only a 'stat' will tell you that. > > > > -JZ > > > > Can you provide an example of what that would look like? A high > Outstanding count? A Mode that's not "leader", "follower", or > "observer"? > > -J > > > > On 1/23/12 1:51 PM, "Philip Smith" <[EMAIL PROTECTED]> wrote: > > > >>There is a batch java program that does a health check: > >> > >>validateZookeeperService.validateZookeeperService() > >> > >> > >>which basically runs the ruok command. You could run the stat command and > >>parse out the response but I think 99% of what you want could be simply > >>looking for 'imok' in the response to the ruok command. > >> > >>philip_smith@st11p00td-devlog001:~ 2 $ alias zkok > >>alias zkok='for idx in 1 2 3 4 5 ; do export > >>zkserver="st11p00td-zookeeper00${idx}" ; echo "$zkserver $( echo ruok | > >>nc $zkserver 2181 )" ; done' > >>philip_smith@st11p00td-devlog001:~ 3 $ zkok > >>st11p00td-zookeeper001 imok > >>st11p00td-zookeeper002 imok > >>st11p00td-zookeeper003 imok > >>st11p00td-zookeeper004 imok > >>st11p00td-zookeeper005 imok > >>philip_smith@st11p00td-devlog001:~ 4 $ > >> > >> > >>On Jan 23, 2012, at 1:45 PM, Natarajan Suresh wrote: > >> > >>> I am trying to write a small health check script for the zookeeper > >>>instances. The "stat" command gives me an output like this: > >>> ===========================Zookeeper version: 3.3.3-1073969, built on > >>>02/23/2011 22:27 GMTClients: > >>>/127.0.0.1:38929[0](queued=0,recved=1,sent=0) > >>>/17.155.7.152:37603[1](queued=0,recved=474752,sent=474752) > >>> Latency min/avg/max: 0/0/35Received: 1113675Sent: 1113706Outstanding: > >>>0Zxid: 0x2000e2925Mode: followerNode count: > 71==========================> >>> How do I know that the server is ok ? I do not have a bad instance with > >>>me to checkout what the output will be in that case. > >>> If anyone has already written a health check script, can you please > >>>share with me ? > >>> Thanks > >>> |Suresh| > >> > >> > >>Regards, Philip > >> > >>Philip Smith > >>Senior Software Engineer > >>[EMAIL PROTECTED] > >>408 862-1360 office > >>530 574-1659 mobile > >> > > >
+
Camille Fournier 2012-01-24, 04:29
-
Re: Healthcheck using the stat command
Patrick Hunt 2012-01-25, 23:32
Prefer "srvr" over "stat" in most cases - stat returns details on the connections which you probably don't need. (and adds unnecessary load on the server)
Also look at mntr in 3.4+
Patrick
On Mon, Jan 23, 2012 at 8:29 PM, Camille Fournier <[EMAIL PROTECTED]> wrote: > If the node is "leader", "follower" or "observer" you are ok. That's all > you should need to look for. ruok just checks to see if the port is > responding and the code is running, pretty much. stat will actually look at > the system and see if the zkserver has started. If so, you'll get some > info, if not, you'll see > "This ZooKeeper instance is not currently serving requests" > > C > > On Mon, Jan 23, 2012 at 11:24 PM, Jonathan Simms <[EMAIL PROTECTED]> wrote: > >> On Mon, Jan 23, 2012 at 5:16 PM, Jordan Zimmerman >> <[EMAIL PROTECTED]> wrote: >> > The problem with 'ruok' is that it doesn't tell you the state of the >> > Instance. 'ruok' might return 'imok' but the instance might not be >> serving >> > due to some other error. Only a 'stat' will tell you that. >> > >> > -JZ >> > >> >> Can you provide an example of what that would look like? A high >> Outstanding count? A Mode that's not "leader", "follower", or >> "observer"? >> >> -J >> >> >> > On 1/23/12 1:51 PM, "Philip Smith" <[EMAIL PROTECTED]> wrote: >> > >> >>There is a batch java program that does a health check: >> >> >> >>validateZookeeperService.validateZookeeperService() >> >> >> >> >> >>which basically runs the ruok command. You could run the stat command and >> >>parse out the response but I think 99% of what you want could be simply >> >>looking for 'imok' in the response to the ruok command. >> >> >> >>philip_smith@st11p00td-devlog001:~ 2 $ alias zkok >> >>alias zkok='for idx in 1 2 3 4 5 ; do export >> >>zkserver="st11p00td-zookeeper00${idx}" ; echo "$zkserver $( echo ruok | >> >>nc $zkserver 2181 )" ; done' >> >>philip_smith@st11p00td-devlog001:~ 3 $ zkok >> >>st11p00td-zookeeper001 imok >> >>st11p00td-zookeeper002 imok >> >>st11p00td-zookeeper003 imok >> >>st11p00td-zookeeper004 imok >> >>st11p00td-zookeeper005 imok >> >>philip_smith@st11p00td-devlog001:~ 4 $ >> >> >> >> >> >>On Jan 23, 2012, at 1:45 PM, Natarajan Suresh wrote: >> >> >> >>> I am trying to write a small health check script for the zookeeper >> >>>instances. The "stat" command gives me an output like this: >> >>> ===========================Zookeeper version: 3.3.3-1073969, built on >> >>>02/23/2011 22:27 GMTClients: >> >>>/127.0.0.1:38929[0](queued=0,recved=1,sent=0) >> >>>/17.155.7.152:37603[1](queued=0,recved=474752,sent=474752) >> >>> Latency min/avg/max: 0/0/35Received: 1113675Sent: 1113706Outstanding: >> >>>0Zxid: 0x2000e2925Mode: followerNode count: >> 71==========================>> >>> How do I know that the server is ok ? I do not have a bad instance with >> >>>me to checkout what the output will be in that case. >> >>> If anyone has already written a health check script, can you please >> >>>share with me ? >> >>> Thanks >> >>> |Suresh| >> >> >> >> >> >>Regards, Philip >> >> >> >>Philip Smith >> >>Senior Software Engineer >> >>[EMAIL PROTECTED] >> >>408 862-1360 office >> >>530 574-1659 mobile >> >> >> > >>
+
Patrick Hunt 2012-01-25, 23:32
-
Re: Healthcheck using the stat command
Jordan Zimmerman 2012-01-26, 23:37
Is 'srvr' the same as 'stat' but without the clients? I'm relying on it in a monitor app. I need to the Mode and the message "not currently serving".
-JZ
On 1/25/12 3:32 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:
>Prefer "srvr" over "stat" in most cases - stat returns details on the >connections which you probably don't need. (and adds unnecessary load >on the server)
+
Jordan Zimmerman 2012-01-26, 23:37
-
Re: Healthcheck using the stat command
Patrick Hunt 2012-01-27, 00:19
On Thu, Jan 26, 2012 at 3:37 PM, Jordan Zimmerman <[EMAIL PROTECTED]> wrote: > Is 'srvr' the same as 'stat' but without the clients? I'm relying on it in > a monitor app. I need to the Mode and the message "not currently serving".
Yes, it's the exact same code path (stat/srvr) with a condition that outputs the connection details only on 'stat'.
Patrick
> > On 1/25/12 3:32 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: > >>Prefer "srvr" over "stat" in most cases - stat returns details on the >>connections which you probably don't need. (and adds unnecessary load >>on the server) >
+
Patrick Hunt 2012-01-27, 00:19
-
Re: Healthcheck using the stat command
Marshall McMullen 2012-01-27, 00:33
Out of curiosity, what does 'mntr' stand for?
On Thu, Jan 26, 2012 at 5:19 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> On Thu, Jan 26, 2012 at 3:37 PM, Jordan Zimmerman > <[EMAIL PROTECTED]> wrote: > > Is 'srvr' the same as 'stat' but without the clients? I'm relying on it > in > > a monitor app. I need to the Mode and the message "not currently > serving". > > Yes, it's the exact same code path (stat/srvr) with a condition that > outputs the connection details only on 'stat'. > > Patrick > > > > > On 1/25/12 3:32 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: > > > >>Prefer "srvr" over "stat" in most cases - stat returns details on the > >>connections which you probably don't need. (and adds unnecessary load > >>on the server) > > >
+
Marshall McMullen 2012-01-27, 00:33
-
Re: Healthcheck using the stat command
Patrick Hunt 2012-01-27, 00:35
monitor. https://issues.apache.org/jira/browse/ZOOKEEPER-744Patrick On Thu, Jan 26, 2012 at 4:33 PM, Marshall McMullen <[EMAIL PROTECTED]> wrote: > Out of curiosity, what does 'mntr' stand for? > > On Thu, Jan 26, 2012 at 5:19 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > >> On Thu, Jan 26, 2012 at 3:37 PM, Jordan Zimmerman >> <[EMAIL PROTECTED]> wrote: >> > Is 'srvr' the same as 'stat' but without the clients? I'm relying on it >> in >> > a monitor app. I need to the Mode and the message "not currently >> serving". >> >> Yes, it's the exact same code path (stat/srvr) with a condition that >> outputs the connection details only on 'stat'. >> >> Patrick >> >> > >> > On 1/25/12 3:32 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: >> > >> >>Prefer "srvr" over "stat" in most cases - stat returns details on the >> >>connections which you probably don't need. (and adds unnecessary load >> >>on the server) >> > >>
+
Patrick Hunt 2012-01-27, 00:35
-
Re: Healthcheck using the stat command
Marshall McMullen 2012-01-27, 00:36
Ah, makes perfect sense now. Thanks! On Thu, Jan 26, 2012 at 5:35 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > monitor. https://issues.apache.org/jira/browse/ZOOKEEPER-744> > Patrick > > On Thu, Jan 26, 2012 at 4:33 PM, Marshall McMullen > <[EMAIL PROTECTED]> wrote: > > Out of curiosity, what does 'mntr' stand for? > > > > On Thu, Jan 26, 2012 at 5:19 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > > > >> On Thu, Jan 26, 2012 at 3:37 PM, Jordan Zimmerman > >> <[EMAIL PROTECTED]> wrote: > >> > Is 'srvr' the same as 'stat' but without the clients? I'm relying on > it > >> in > >> > a monitor app. I need to the Mode and the message "not currently > >> serving". > >> > >> Yes, it's the exact same code path (stat/srvr) with a condition that > >> outputs the connection details only on 'stat'. > >> > >> Patrick > >> > >> > > >> > On 1/25/12 3:32 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: > >> > > >> >>Prefer "srvr" over "stat" in most cases - stat returns details on the > >> >>connections which you probably don't need. (and adds unnecessary load > >> >>on the server) > >> > > >> >
+
Marshall McMullen 2012-01-27, 00:36
-
Re: Healthcheck using the stat command
Patrick Hunt 2012-01-27, 00:38
NP. On Thu, Jan 26, 2012 at 4:36 PM, Marshall McMullen <[EMAIL PROTECTED]> wrote: > Ah, makes perfect sense now. Thanks! > > On Thu, Jan 26, 2012 at 5:35 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > >> monitor. https://issues.apache.org/jira/browse/ZOOKEEPER-744>> >> Patrick >> >> On Thu, Jan 26, 2012 at 4:33 PM, Marshall McMullen >> <[EMAIL PROTECTED]> wrote: >> > Out of curiosity, what does 'mntr' stand for? >> > >> > On Thu, Jan 26, 2012 at 5:19 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: >> > >> >> On Thu, Jan 26, 2012 at 3:37 PM, Jordan Zimmerman >> >> <[EMAIL PROTECTED]> wrote: >> >> > Is 'srvr' the same as 'stat' but without the clients? I'm relying on >> it >> >> in >> >> > a monitor app. I need to the Mode and the message "not currently >> >> serving". >> >> >> >> Yes, it's the exact same code path (stat/srvr) with a condition that >> >> outputs the connection details only on 'stat'. >> >> >> >> Patrick >> >> >> >> > >> >> > On 1/25/12 3:32 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: >> >> > >> >> >>Prefer "srvr" over "stat" in most cases - stat returns details on the >> >> >>connections which you probably don't need. (and adds unnecessary load >> >> >>on the server) >> >> > >> >> >>
+
Patrick Hunt 2012-01-27, 00:38
|
|