Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - watchers not fired after a disk failure?


+
Jeremy Stribling 2012-03-05, 22:47
Copy link to this message
-
Re: watchers not fired after a disk failure?
Neha Narkhede 2012-03-05, 23:09
Jeremy,

>> * This session is able to successfully create ephemeral znodes, but watches never fire for the session.  For example, when the session has a children watch set on /election/a, and then creates /election/a/a_00000001, its watch on /election/a never fires (but it does fire for sessions coming from other clients).

Have you tried running wchc on your zookeeper servers to see what
paths the session ids are watching ?

Thanks,
Neha

On Mon, Mar 5, 2012 at 2:47 PM, Jeremy Stribling <[EMAIL PROTECTED]> wrote:
> I have been investigating an issue at one of our customers (our product
> embeds Zookeeper in it), and here's a summary of what I've been able to
> discern:
>
> * One of the servers in a 3-server ZK cluster (ZK 3.3.3 with some patches, C
> client) experiences a hardware/firmware failure of its RAID partition
> (possibly affecting the network card as well).  From /var/log/syslog:
>
> -------------------------------
> Feb 24 01:53:54 controller3 kernel: [881855.148384] megaraid_sas
> 0000:03:00.0: vpd r/w failed.  This is likely a firmware bug on this device.
>  Contact the card vendor for a firmware update.
> Feb 24 01:53:54 controller3 kernel: [1398904.161148] bnx2 0000:01:00.0: irq
> 73 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161157] bnx2 0000:01:00.0: irq
> 74 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161165] bnx2 0000:01:00.0: irq
> 75 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161173] bnx2 0000:01:00.0: irq
> 76 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161181] bnx2 0000:01:00.0: irq
> 77 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161188] bnx2 0000:01:00.0: irq
> 78 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161196] bnx2 0000:01:00.0: irq
> 79 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161203] bnx2 0000:01:00.0: irq
> 80 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161210] bnx2 0000:01:00.0: irq
> 81 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.241931] bnx2 0000:01:00.0:
> eth0: using MSIX
> Feb 24 01:53:54 controller3 kernel: [1398904.243302] ADDRCONF(NETDEV_UP):
> eth0: link is not ready
> Feb 24 01:53:57 controller3 kernel: [1398907.309739] bnx2 0000:01:00.0:
> eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow
> control ON
> Feb 24 01:53:57 controller3 kernel: [1398907.311239]
> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> Feb 24 01:55:06 controller3 ntpdate[10764]: step time server 72.3.128.241
> offset 63.134559 sec
> Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name > controller/disk-sda/disk_octets; value time = 1330048506; last cache update
> = 1330048506;
> Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in
> target `write': Dispatching value to all write plugins failed with status
> -1.
> Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name > controller/disk-sda/disk_ops; value time = 1330048506; last cache update > 1330048506;
> Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in
> target `write': Dispatching value to all write plugins failed with status
> -1.
> ...
> Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in
> target `write': Dispatching value to all write plugins failed with status
> -1.
> Feb 24 01:55:10 controller3 kernel: [1398917.517473] eth0: no IPv6 routers
> present
> Feb 24 01:55:56 controller3 kernel: [1398962.889915] bnx2 0000:01:00.0: irq
> 73 for MSI/MSI-X
> Feb 24 01:55:56 controller3 kernel: [1398962.889924] bnx2 0000:01:00.0: irq
> 74 for MSI/MSI-X
> Feb 24 01:55:56 controller3 kernel: [1398962.889932] bnx2 0000:01:00.0: irq
> 75 for MSI/MSI-X
> Feb 24 01:55:56 controller3 kernel: [1398962.889939] bnx2 0000:01:00.0: irq
> 76 for MSI/MSI-X
> Feb 24 01:55:56 controller3 kernel: [1398962.889946] bnx2 0000:01:00.0: irq
> 77 for MSI/MSI-X
> Feb 24 01:55:56 controller3 kernel: [1398962.889953] bnx2 0000:01:00.0: irq
+
Jeremy Stribling 2012-03-05, 23:16