|
|
-
watchers not fired after a disk failure?
Jeremy Stribling 2012-03-05, 22:47
I have been investigating an issue at one of our customers (our product embeds Zookeeper in it), and here's a summary of what I've been able to discern:
* One of the servers in a 3-server ZK cluster (ZK 3.3.3 with some patches, C client) experiences a hardware/firmware failure of its RAID partition (possibly affecting the network card as well). From /var/log/syslog:
------------------------------- Feb 24 01:53:54 controller3 kernel: [881855.148384] megaraid_sas 0000:03:00.0: vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update. Feb 24 01:53:54 controller3 kernel: [1398904.161148] bnx2 0000:01:00.0: irq 73 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161157] bnx2 0000:01:00.0: irq 74 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161165] bnx2 0000:01:00.0: irq 75 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161173] bnx2 0000:01:00.0: irq 76 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161181] bnx2 0000:01:00.0: irq 77 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161188] bnx2 0000:01:00.0: irq 78 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161196] bnx2 0000:01:00.0: irq 79 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161203] bnx2 0000:01:00.0: irq 80 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.161210] bnx2 0000:01:00.0: irq 81 for MSI/MSI-X Feb 24 01:53:54 controller3 kernel: [1398904.241931] bnx2 0000:01:00.0: eth0: using MSIX Feb 24 01:53:54 controller3 kernel: [1398904.243302] ADDRCONF(NETDEV_UP): eth0: link is not ready Feb 24 01:53:57 controller3 kernel: [1398907.309739] bnx2 0000:01:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON Feb 24 01:53:57 controller3 kernel: [1398907.311239] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Feb 24 01:55:06 controller3 ntpdate[10764]: step time server 72.3.128.241 offset 63.134559 sec Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name = controller/disk-sda/disk_octets; value time = 1330048506; last cache update = 1330048506; Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1. Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name = controller/disk-sda/disk_ops; value time = 1330048506; last cache update = 1330048506; Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1. ... Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1. Feb 24 01:55:10 controller3 kernel: [1398917.517473] eth0: no IPv6 routers present Feb 24 01:55:56 controller3 kernel: [1398962.889915] bnx2 0000:01:00.0: irq 73 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889924] bnx2 0000:01:00.0: irq 74 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889932] bnx2 0000:01:00.0: irq 75 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889939] bnx2 0000:01:00.0: irq 76 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889946] bnx2 0000:01:00.0: irq 77 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889953] bnx2 0000:01:00.0: irq 78 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889960] bnx2 0000:01:00.0: irq 79 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889968] bnx2 0000:01:00.0: irq 80 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.889975] bnx2 0000:01:00.0: irq 81 for MSI/MSI-X Feb 24 01:55:56 controller3 kernel: [1398962.970592] bnx2 0000:01:00.0: eth0: using MSIX Feb 24 01:55:56 controller3 kernel: [1398962.971959] ADDRCONF(NETDEV_UP): eth0: link is not ready Feb 24 01:55:59 controller3 kernel: [1398966.028629] bnx2 0000:01:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON Feb 24 01:55:59 controller3 kernel: [1398966.030132] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Feb 24 01:56:05 controller3 ntpdate[10968]: step time server 72.3.128.241 offset 0.002667 sec
* This leads existing ZK client sessions to expire, and those clients attempt to make new sessions. * One client (on a different node) establishes a new session to this server at 01:55:56 (coinciding exactly with the second set of IRQ messages shown above):
2012-02-24 01:55:56,964:17928(0x7fbf0b077700):ZOO_INFO@check_events@1632: session establishment complete on server [10.14.237.70:2888], sessionId=0xd235ad12815a0000, negotiated timeout=6000
* This session is able to successfully create ephemeral znodes, but watches never fire for the session. For example, when the session has a children watch set on /election/a, and then creates /election/a/a_00000001, its watch on /election/a never fires (but it does fire for sessions coming from other clients).
Unfortunately I cannot reproduce this, and we don't have good logs. Because of this, I cannot definitively place the blame on Zookeeper -- there are layers of our product above ZK that could theoretically be the problem. But given the evidence I've looked at, ZK seems to be the likely culprit.
So my question is: is anyone aware of a failure scenario that would cause watches to not be fired, but allow other operations from the same session to be successful? And are there any workarounds? I don't expect much help on this from the community, given the lack of logs and evidence, but I just wanted to throw it out there and see if anyone had any ideas off the top of their head. Thanks,
Jeremy
-
Re: watchers not fired after a disk failure?
Neha Narkhede 2012-03-05, 23:09
Jeremy,
>> * This session is able to successfully create ephemeral znodes, but watches never fire for the session. For example, when the session has a children watch set on /election/a, and then creates /election/a/a_00000001, its watch on /election/a never fires (but it does fire for sessions coming from other clients).
Have you tried running wchc on your zookeeper servers to see what paths the session ids are watching ?
Thanks, Neha
On Mon, Mar 5, 2012 at 2:47 PM, Jeremy Stribling <[EMAIL PROTECTED]> wrote: > I have been investigating an issue at one of our customers (our product > embeds Zookeeper in it), and here's a summary of what I've been able to > discern: > > * One of the servers in a 3-server ZK cluster (ZK 3.3.3 with some patches, C > client) experiences a hardware/firmware failure of its RAID partition > (possibly affecting the network card as well). From /var/log/syslog: > > ------------------------------- > Feb 24 01:53:54 controller3 kernel: [881855.148384] megaraid_sas > 0000:03:00.0: vpd r/w failed. This is likely a firmware bug on this device. > Contact the card vendor for a firmware update. > Feb 24 01:53:54 controller3 kernel: [1398904.161148] bnx2 0000:01:00.0: irq > 73 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161157] bnx2 0000:01:00.0: irq > 74 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161165] bnx2 0000:01:00.0: irq > 75 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161173] bnx2 0000:01:00.0: irq > 76 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161181] bnx2 0000:01:00.0: irq > 77 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161188] bnx2 0000:01:00.0: irq > 78 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161196] bnx2 0000:01:00.0: irq > 79 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161203] bnx2 0000:01:00.0: irq > 80 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161210] bnx2 0000:01:00.0: irq > 81 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.241931] bnx2 0000:01:00.0: > eth0: using MSIX > Feb 24 01:53:54 controller3 kernel: [1398904.243302] ADDRCONF(NETDEV_UP): > eth0: link is not ready > Feb 24 01:53:57 controller3 kernel: [1398907.309739] bnx2 0000:01:00.0: > eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow > control ON > Feb 24 01:53:57 controller3 kernel: [1398907.311239] > ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > Feb 24 01:55:06 controller3 ntpdate[10764]: step time server 72.3.128.241 > offset 63.134559 sec > Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name > controller/disk-sda/disk_octets; value time = 1330048506; last cache update > = 1330048506; > Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in > target `write': Dispatching value to all write plugins failed with status > -1. > Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name > controller/disk-sda/disk_ops; value time = 1330048506; last cache update > 1330048506; > Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in > target `write': Dispatching value to all write plugins failed with status > -1. > ... > Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in > target `write': Dispatching value to all write plugins failed with status > -1. > Feb 24 01:55:10 controller3 kernel: [1398917.517473] eth0: no IPv6 routers > present > Feb 24 01:55:56 controller3 kernel: [1398962.889915] bnx2 0000:01:00.0: irq > 73 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.889924] bnx2 0000:01:00.0: irq > 74 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.889932] bnx2 0000:01:00.0: irq > 75 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.889939] bnx2 0000:01:00.0: irq > 76 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.889946] bnx2 0000:01:00.0: irq > 77 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.889953] bnx2 0000:01:00.0: irq
-
Re: watchers not fired after a disk failure?
Jeremy Stribling 2012-03-05, 23:16
On 03/05/2012 03:09 PM, Neha Narkhede wrote: > Jeremy, > >>> * This session is able to successfully create ephemeral znodes, but watches never fire for the session. For example, when the session has a children watch set on /election/a, and then creates /election/a/a_00000001, its watch on /election/a never fires (but it does fire for sessions coming from other clients). > Have you tried running wchc on your zookeeper servers to see what > paths the session ids are watching ?
Unfortunately this is a bug report that came in several days after the system was in that state, so I am not able to run anything. Thanks for the suggestion though. > Thanks, > Neha > > On Mon, Mar 5, 2012 at 2:47 PM, Jeremy Stribling<[EMAIL PROTECTED]> wrote: >> I have been investigating an issue at one of our customers (our product >> embeds Zookeeper in it), and here's a summary of what I've been able to >> discern: >> >> * One of the servers in a 3-server ZK cluster (ZK 3.3.3 with some patches, C >> client) experiences a hardware/firmware failure of its RAID partition >> (possibly affecting the network card as well). From /var/log/syslog: >> >> ------------------------------- >> Feb 24 01:53:54 controller3 kernel: [881855.148384] megaraid_sas >> 0000:03:00.0: vpd r/w failed. This is likely a firmware bug on this device. >> Contact the card vendor for a firmware update. >> Feb 24 01:53:54 controller3 kernel: [1398904.161148] bnx2 0000:01:00.0: irq >> 73 for MSI/MSI-X >> Feb 24 01:53:54 controller3 kernel: [1398904.161157] bnx2 0000:01:00.0: irq >> 74 for MSI/MSI-X >> Feb 24 01:53:54 controller3 kernel: [1398904.161165] bnx2 0000:01:00.0: irq >> 75 for MSI/MSI-X >> Feb 24 01:53:54 controller3 kernel: [1398904.161173] bnx2 0000:01:00.0: irq >> 76 for MSI/MSI-X >> Feb 24 01:53:54 controller3 kernel: [1398904.161181] bnx2 0000:01:00.0: irq >> 77 for MSI/MSI-X >> Feb 24 01:53:54 controller3 kernel: [1398904.161188] bnx2 0000:01:00.0: irq >> 78 for MSI/MSI-X >> Feb 24 01:53:54 controller3 kernel: [1398904.161196] bnx2 0000:01:00.0: irq >> 79 for MSI/MSI-X >> Feb 24 01:53:54 controller3 kernel: [1398904.161203] bnx2 0000:01:00.0: irq >> 80 for MSI/MSI-X >> Feb 24 01:53:54 controller3 kernel: [1398904.161210] bnx2 0000:01:00.0: irq >> 81 for MSI/MSI-X >> Feb 24 01:53:54 controller3 kernel: [1398904.241931] bnx2 0000:01:00.0: >> eth0: using MSIX >> Feb 24 01:53:54 controller3 kernel: [1398904.243302] ADDRCONF(NETDEV_UP): >> eth0: link is not ready >> Feb 24 01:53:57 controller3 kernel: [1398907.309739] bnx2 0000:01:00.0: >> eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive& transmit flow >> control ON >> Feb 24 01:53:57 controller3 kernel: [1398907.311239] >> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >> Feb 24 01:55:06 controller3 ntpdate[10764]: step time server 72.3.128.241 >> offset 63.134559 sec >> Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name >> controller/disk-sda/disk_octets; value time = 1330048506; last cache update >> = 1330048506; >> Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in >> target `write': Dispatching value to all write plugins failed with status >> -1. >> Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name >> controller/disk-sda/disk_ops; value time = 1330048506; last cache update >> 1330048506; >> Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in >> target `write': Dispatching value to all write plugins failed with status >> -1. >> ... >> Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in >> target `write': Dispatching value to all write plugins failed with status >> -1. >> Feb 24 01:55:10 controller3 kernel: [1398917.517473] eth0: no IPv6 routers >> present >> Feb 24 01:55:56 controller3 kernel: [1398962.889915] bnx2 0000:01:00.0: irq >> 73 for MSI/MSI-X >> Feb 24 01:55:56 controller3 kernel: [1398962.889924] bnx2 0000:01:00.0: irq >> 74 for MSI/MSI-X >> Feb 24 01:55:56 controller3 kernel: [1398962.889932] bnx2 0000:01:00.0: irq
|
|