|
|
Guy Doulberg 2012-10-11, 14:48
Hi guys,
I am trying to understand a phenomena I am having in my cluster,
My cluster consist of 3 zookeeprs (that are in the same machines as the brokers)
Sometimes the zookeeprs freezes, which means, I cann't use new consumers, and I can't browse the znode using zookeeper browser.
It disappears after a while, without doing anything pro-actively.
A consumer that was already running is working alright - I guess since the zookeeper are not available the consumer doesn't report offsets. in my connection string I put all the 3 zookeeper instances. Have it ever happened to one of you?
Thanks,
Guy Doulberg Data Infrastructure engineer Conduit
+
Guy Doulberg 2012-10-11, 14:48
Neha Narkhede 2012-10-11, 15:28
Guy,
When you say zookeeper freezes, have you tried running any of the zookeeper 4-letter commands ? (ruok/cons etc) Also, what do the zookeeper and consumer logs look like ? It will be helpful if you can share these observations and logs.
Thanks, Neha
On Thu, Oct 11, 2012 at 7:48 AM, Guy Doulberg <[EMAIL PROTECTED]> wrote: > Hi guys, > > I am trying to understand a phenomena I am having in my cluster, > > My cluster consist of 3 zookeeprs (that are in the same machines as the > brokers) > > Sometimes the zookeeprs freezes, which means, I cann't use new consumers, > and I can't browse the znode using zookeeper browser. > > It disappears after a while, without doing anything pro-actively. > > A consumer that was already running is working alright - I guess since the > zookeeper are not available the consumer doesn't report offsets. > > > in my connection string I put all the 3 zookeeper instances. > > > Have it ever happened to one of you? > > Thanks, > > Guy Doulberg > Data Infrastructure engineer > Conduit > >
+
Neha Narkhede 2012-10-11, 15:28
Guy Doulberg 2012-10-11, 15:30
in the logs I got
connection reset by peer, and Timeout exceptions
I didn't run any zookeepr command
On 10/11/2012 05:28 PM, Neha Narkhede wrote: > Guy, > > When you say zookeeper freezes, have you tried running any of the > zookeeper 4-letter commands ? (ruok/cons etc) > Also, what do the zookeeper and consumer logs look like ? It will be > helpful if you can share these observations and logs. > > Thanks, > Neha > > On Thu, Oct 11, 2012 at 7:48 AM, Guy Doulberg <[EMAIL PROTECTED]> wrote: >> Hi guys, >> >> I am trying to understand a phenomena I am having in my cluster, >> >> My cluster consist of 3 zookeeprs (that are in the same machines as the >> brokers) >> >> Sometimes the zookeeprs freezes, which means, I cann't use new consumers, >> and I can't browse the znode using zookeeper browser. >> >> It disappears after a while, without doing anything pro-actively. >> >> A consumer that was already running is working alright - I guess since the >> zookeeper are not available the consumer doesn't report offsets. >> >> >> in my connection string I put all the 3 zookeeper instances. >> >> >> Have it ever happened to one of you? >> >> Thanks, >> >> Guy Doulberg >> Data Infrastructure engineer >> Conduit >> >>
+
Guy Doulberg 2012-10-11, 15:30
Jay Kreps 2012-10-11, 15:39
Are you logging GC activity on the zk jvm? We had a lot of zk gc problems before we got more scientific about our jvm settings (I think we added some notes on the operations page).
-Jay
On Thu, Oct 11, 2012 at 7:48 AM, Guy Doulberg <[EMAIL PROTECTED]>wrote:
> Hi guys, > > I am trying to understand a phenomena I am having in my cluster, > > My cluster consist of 3 zookeeprs (that are in the same machines as the > brokers) > > Sometimes the zookeeprs freezes, which means, I cann't use new consumers, > and I can't browse the znode using zookeeper browser. > > It disappears after a while, without doing anything pro-actively. > > A consumer that was already running is working alright - I guess since the > zookeeper are not available the consumer doesn't report offsets. > > > in my connection string I put all the 3 zookeeper instances. > > > Have it ever happened to one of you? > > Thanks, > > Guy Doulberg > Data Infrastructure engineer > Conduit > > >
+
Jay Kreps 2012-10-11, 15:39
Guy Doulberg 2012-10-11, 15:41
Thanks Jay,
I will monitor the JVM on that machine On 10/11/2012 05:39 PM, Jay Kreps wrote: > Are you logging GC activity on the zk jvm? We had a lot of zk gc problems > before we got more scientific about our jvm settings (I think we added some > notes on the operations page). > > -Jay > > On Thu, Oct 11, 2012 at 7:48 AM, Guy Doulberg <[EMAIL PROTECTED]>wrote: > >> Hi guys, >> >> I am trying to understand a phenomena I am having in my cluster, >> >> My cluster consist of 3 zookeeprs (that are in the same machines as the >> brokers) >> >> Sometimes the zookeeprs freezes, which means, I cann't use new consumers, >> and I can't browse the znode using zookeeper browser. >> >> It disappears after a while, without doing anything pro-actively. >> >> A consumer that was already running is working alright - I guess since the >> zookeeper are not available the consumer doesn't report offsets. >> >> >> in my connection string I put all the 3 zookeeper instances. >> >> >> Have it ever happened to one of you? >> >> Thanks, >> >> Guy Doulberg >> Data Infrastructure engineer >> Conduit >> >> >>
+
Guy Doulberg 2012-10-11, 15:41
Neha Narkhede 2012-10-11, 15:42
Guy,
Connection reset messages are not particularly uncommon. Monitoring GC and io on Zookeeper is necessary, like Jay mentions, however if you are hitting any issues due to GC, you would see session expirations and timeouts. It will be helpful if you can send around the log4j files.
Thanks, Neha
On Thu, Oct 11, 2012 at 8:39 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > Are you logging GC activity on the zk jvm? We had a lot of zk gc problems > before we got more scientific about our jvm settings (I think we added some > notes on the operations page). > > -Jay > > On Thu, Oct 11, 2012 at 7:48 AM, Guy Doulberg <[EMAIL PROTECTED]>wrote: > >> Hi guys, >> >> I am trying to understand a phenomena I am having in my cluster, >> >> My cluster consist of 3 zookeeprs (that are in the same machines as the >> brokers) >> >> Sometimes the zookeeprs freezes, which means, I cann't use new consumers, >> and I can't browse the znode using zookeeper browser. >> >> It disappears after a while, without doing anything pro-actively. >> >> A consumer that was already running is working alright - I guess since the >> zookeeper are not available the consumer doesn't report offsets. >> >> >> in my connection string I put all the 3 zookeeper instances. >> >> >> Have it ever happened to one of you? >> >> Thanks, >> >> Guy Doulberg >> Data Infrastructure engineer >> Conduit >> >> >>
+
Neha Narkhede 2012-10-11, 15:42
Patricio Echagüe 2012-10-12, 23:19
I'd recommend to run iostat -x while you do the test to rule that that your are not io bound.
On Thu, Oct 11, 2012 at 8:42 AM, Neha Narkhede <[EMAIL PROTECTED]>wrote:
> Guy, > > Connection reset messages are not particularly uncommon. Monitoring GC > and io on Zookeeper is necessary, like Jay mentions, however if you > are hitting any issues due to GC, you would see session expirations > and timeouts. > It will be helpful if you can send around the log4j files. > > Thanks, > Neha > > On Thu, Oct 11, 2012 at 8:39 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > > Are you logging GC activity on the zk jvm? We had a lot of zk gc problems > > before we got more scientific about our jvm settings (I think we added > some > > notes on the operations page). > > > > -Jay > > > > On Thu, Oct 11, 2012 at 7:48 AM, Guy Doulberg <[EMAIL PROTECTED] > >wrote: > > > >> Hi guys, > >> > >> I am trying to understand a phenomena I am having in my cluster, > >> > >> My cluster consist of 3 zookeeprs (that are in the same machines as the > >> brokers) > >> > >> Sometimes the zookeeprs freezes, which means, I cann't use new > consumers, > >> and I can't browse the znode using zookeeper browser. > >> > >> It disappears after a while, without doing anything pro-actively. > >> > >> A consumer that was already running is working alright - I guess since > the > >> zookeeper are not available the consumer doesn't report offsets. > >> > >> > >> in my connection string I put all the 3 zookeeper instances. > >> > >> > >> Have it ever happened to one of you? > >> > >> Thanks, > >> > >> Guy Doulberg > >> Data Infrastructure engineer > >> Conduit > >> > >> > >> >
+
Patricio Echagüe 2012-10-12, 23:19
Guy Doulberg 2012-10-12, 23:34
How can the zookeeper be IO bound?
Isn't all its work in memory?
Patricio Echagüe <[EMAIL PROTECTED]> כתב: I'd recommend to run iostat -x while you do the test to rule that that your are not io bound.
On Thu, Oct 11, 2012 at 8:42 AM, Neha Narkhede <[EMAIL PROTECTED]>wrote:
> Guy, > > Connection reset messages are not particularly uncommon. Monitoring GC > and io on Zookeeper is necessary, like Jay mentions, however if you > are hitting any issues due to GC, you would see session expirations > and timeouts. > It will be helpful if you can send around the log4j files. > > Thanks, > Neha > > On Thu, Oct 11, 2012 at 8:39 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > > Are you logging GC activity on the zk jvm? We had a lot of zk gc problems > > before we got more scientific about our jvm settings (I think we added > some > > notes on the operations page). > > > > -Jay > > > > On Thu, Oct 11, 2012 at 7:48 AM, Guy Doulberg <[EMAIL PROTECTED] > >wrote: > > > >> Hi guys, > >> > >> I am trying to understand a phenomena I am having in my cluster, > >> > >> My cluster consist of 3 zookeeprs (that are in the same machines as the > >> brokers) > >> > >> Sometimes the zookeeprs freezes, which means, I cann't use new > consumers, > >> and I can't browse the znode using zookeeper browser. > >> > >> It disappears after a while, without doing anything pro-actively. > >> > >> A consumer that was already running is working alright - I guess since > the > >> zookeeper are not available the consumer doesn't report offsets. > >> > >> > >> in my connection string I put all the 3 zookeeper instances. > >> > >> > >> Have it ever happened to one of you? > >> > >> Thanks, > >> > >> Guy Doulberg > >> Data Infrastructure engineer > >> Conduit > >> > >> > >> >
+
Guy Doulberg 2012-10-12, 23:34
Neha Narkhede 2012-10-12, 23:39
Zookeeper does batched synchronous writes to disk. If you are IO bound, each write will take longer leading to a large number of queued up writes on the leader. This will affect zookeeper performance adversely and can even make it unavailable. That is one of the reasons, we deploy the zookeeper transaction log on a dedicated disk and monitor IO closely.
Thanks, Neha
On Fri, Oct 12, 2012 at 4:34 PM, Guy Doulberg <[EMAIL PROTECTED]> wrote: > How can the zookeeper be IO bound? > > Isn't all its work in memory? > > Patricio Echagüe <[EMAIL PROTECTED]> כתב: > > > I'd recommend to run iostat -x while you do the test to rule that that your > are not io bound. > > On Thu, Oct 11, 2012 at 8:42 AM, Neha Narkhede <[EMAIL PROTECTED]>wrote: > >> Guy, >> >> Connection reset messages are not particularly uncommon. Monitoring GC >> and io on Zookeeper is necessary, like Jay mentions, however if you >> are hitting any issues due to GC, you would see session expirations >> and timeouts. >> It will be helpful if you can send around the log4j files. >> >> Thanks, >> Neha >> >> On Thu, Oct 11, 2012 at 8:39 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: >> > Are you logging GC activity on the zk jvm? We had a lot of zk gc problems >> > before we got more scientific about our jvm settings (I think we added >> some >> > notes on the operations page). >> > >> > -Jay >> > >> > On Thu, Oct 11, 2012 at 7:48 AM, Guy Doulberg <[EMAIL PROTECTED] >> >wrote: >> > >> >> Hi guys, >> >> >> >> I am trying to understand a phenomena I am having in my cluster, >> >> >> >> My cluster consist of 3 zookeeprs (that are in the same machines as the >> >> brokers) >> >> >> >> Sometimes the zookeeprs freezes, which means, I cann't use new >> consumers, >> >> and I can't browse the znode using zookeeper browser. >> >> >> >> It disappears after a while, without doing anything pro-actively. >> >> >> >> A consumer that was already running is working alright - I guess since >> the >> >> zookeeper are not available the consumer doesn't report offsets. >> >> >> >> >> >> in my connection string I put all the 3 zookeeper instances. >> >> >> >> >> >> Have it ever happened to one of you? >> >> >> >> Thanks, >> >> >> >> Guy Doulberg >> >> Data Infrastructure engineer >> >> Conduit >> >> >> >> >> >> >>
+
Neha Narkhede 2012-10-12, 23:39
|
|