Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> watchers not fired after a disk failure?


+
Jeremy Stribling 2012-03-05, 22:47
Copy link to this message
-
Re: watchers not fired after a disk failure?
Jeremy,

>> * This session is able to successfully create ephemeral znodes, but watches never fire for the session.  For example, when the session has a children watch set on /election/a, and then creates /election/a/a_00000001, its watch on /election/a never fires (but it does fire for sessions coming from other clients).

Have you tried running wchc on your zookeeper servers to see what
paths the session ids are watching ?

Thanks,
Neha

On Mon, Mar 5, 2012 at 2:47 PM, Jeremy Stribling <[EMAIL PROTECTED]> wrote:
> I have been investigating an issue at one of our customers (our product
> embeds Zookeeper in it), and here's a summary of what I've been able to
> discern:
>
> * One of the servers in a 3-server ZK cluster (ZK 3.3.3 with some patches, C
> client) experiences a hardware/firmware failure of its RAID partition
> (possibly affecting the network card as well).  From /var/log/syslog:
>
> -------------------------------
> Feb 24 01:53:54 controller3 kernel: [881855.148384] megaraid_sas
> 0000:03:00.0: vpd r/w failed.  This is likely a firmware bug on this device.
>  Contact the card vendor for a firmware update.
> Feb 24 01:53:54 controller3 kernel: [1398904.161148] bnx2 0000:01:00.0: irq
> 73 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161157] bnx2 0000:01:00.0: irq
> 74 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161165] bnx2 0000:01:00.0: irq
> 75 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161173] bnx2 0000:01:00.0: irq
> 76 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161181] bnx2 0000:01:00.0: irq
> 77 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161188] bnx2 0000:01:00.0: irq
> 78 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161196] bnx2 0000:01:00.0: irq
> 79 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161203] bnx2 0000:01:00.0: irq
> 80 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.161210] bnx2 0000:01:00.0: irq
> 81 for MSI/MSI-X
> Feb 24 01:53:54 controller3 kernel: [1398904.241931] bnx2 0000:01:00.0:
> eth0: using MSIX
> Feb 24 01:53:54 controller3 kernel: [1398904.243302] ADDRCONF(NETDEV_UP):
> eth0: link is not ready
> Feb 24 01:53:57 controller3 kernel: [1398907.309739] bnx2 0000:01:00.0:
> eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow
> control ON
> Feb 24 01:53:57 controller3 kernel: [1398907.311239]
> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> Feb 24 01:55:06 controller3 ntpdate[10764]: step time server 72.3.128.241
> offset 63.134559 sec
> Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name > controller/disk-sda/disk_octets; value time = 1330048506; last cache update
> = 1330048506;
> Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in
> target `write': Dispatching value to all write plugins failed with status
> -1.
> Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name > controller/disk-sda/disk_ops; value time = 1330048506; last cache update > 1330048506;
> Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in
> target `write': Dispatching value to all write plugins failed with status
> -1.
> ...
> Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in
> target `write': Dispatching value to all write plugins failed with status
> -1.
> Feb 24 01:55:10 controller3 kernel: [1398917.517473] eth0: no IPv6 routers
> present
> Feb 24 01:55:56 controller3 kernel: [1398962.889915] bnx2 0000:01:00.0: irq
> 73 for MSI/MSI-X
> Feb 24 01:55:56 controller3 kernel: [1398962.889924] bnx2 0000:01:00.0: irq
> 74 for MSI/MSI-X
> Feb 24 01:55:56 controller3 kernel: [1398962.889932] bnx2 0000:01:00.0: irq
> 75 for MSI/MSI-X
> Feb 24 01:55:56 controller3 kernel: [1398962.889939] bnx2 0000:01:00.0: irq
> 76 for MSI/MSI-X
> Feb 24 01:55:56 controller3 kernel: [1398962.889946] bnx2 0000:01:00.0: irq
> 77 for MSI/MSI-X
> Feb 24 01:55:56 controller3 kernel: [1398962.889953] bnx2 0000:01:00.0: irq
+
Jeremy Stribling 2012-03-05, 23:16
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB