Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> watchers not fired after a disk failure?


Copy link to this message
-
Re: watchers not fired after a disk failure?
On 03/05/2012 03:09 PM, Neha Narkhede wrote:
> Jeremy,
>
>>> * This session is able to successfully create ephemeral znodes, but watches never fire for the session.  For example, when the session has a children watch set on /election/a, and then creates /election/a/a_00000001, its watch on /election/a never fires (but it does fire for sessions coming from other clients).
> Have you tried running wchc on your zookeeper servers to see what
> paths the session ids are watching ?

Unfortunately this is a bug report that came in several days after the
system was in that state, so I am not able to run anything.  Thanks for
the suggestion though.
> Thanks,
> Neha
>
> On Mon, Mar 5, 2012 at 2:47 PM, Jeremy Stribling<[EMAIL PROTECTED]>  wrote:
>> I have been investigating an issue at one of our customers (our product
>> embeds Zookeeper in it), and here's a summary of what I've been able to
>> discern:
>>
>> * One of the servers in a 3-server ZK cluster (ZK 3.3.3 with some patches, C
>> client) experiences a hardware/firmware failure of its RAID partition
>> (possibly affecting the network card as well).  From /var/log/syslog:
>>
>> -------------------------------
>> Feb 24 01:53:54 controller3 kernel: [881855.148384] megaraid_sas
>> 0000:03:00.0: vpd r/w failed.  This is likely a firmware bug on this device.
>>   Contact the card vendor for a firmware update.
>> Feb 24 01:53:54 controller3 kernel: [1398904.161148] bnx2 0000:01:00.0: irq
>> 73 for MSI/MSI-X
>> Feb 24 01:53:54 controller3 kernel: [1398904.161157] bnx2 0000:01:00.0: irq
>> 74 for MSI/MSI-X
>> Feb 24 01:53:54 controller3 kernel: [1398904.161165] bnx2 0000:01:00.0: irq
>> 75 for MSI/MSI-X
>> Feb 24 01:53:54 controller3 kernel: [1398904.161173] bnx2 0000:01:00.0: irq
>> 76 for MSI/MSI-X
>> Feb 24 01:53:54 controller3 kernel: [1398904.161181] bnx2 0000:01:00.0: irq
>> 77 for MSI/MSI-X
>> Feb 24 01:53:54 controller3 kernel: [1398904.161188] bnx2 0000:01:00.0: irq
>> 78 for MSI/MSI-X
>> Feb 24 01:53:54 controller3 kernel: [1398904.161196] bnx2 0000:01:00.0: irq
>> 79 for MSI/MSI-X
>> Feb 24 01:53:54 controller3 kernel: [1398904.161203] bnx2 0000:01:00.0: irq
>> 80 for MSI/MSI-X
>> Feb 24 01:53:54 controller3 kernel: [1398904.161210] bnx2 0000:01:00.0: irq
>> 81 for MSI/MSI-X
>> Feb 24 01:53:54 controller3 kernel: [1398904.241931] bnx2 0000:01:00.0:
>> eth0: using MSIX
>> Feb 24 01:53:54 controller3 kernel: [1398904.243302] ADDRCONF(NETDEV_UP):
>> eth0: link is not ready
>> Feb 24 01:53:57 controller3 kernel: [1398907.309739] bnx2 0000:01:00.0:
>> eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive&  transmit flow
>> control ON
>> Feb 24 01:53:57 controller3 kernel: [1398907.311239]
>> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> Feb 24 01:55:06 controller3 ntpdate[10764]: step time server 72.3.128.241
>> offset 63.134559 sec
>> Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name >> controller/disk-sda/disk_octets; value time = 1330048506; last cache update
>> = 1330048506;
>> Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in
>> target `write': Dispatching value to all write plugins failed with status
>> -1.
>> Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name >> controller/disk-sda/disk_ops; value time = 1330048506; last cache update >> 1330048506;
>> Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in
>> target `write': Dispatching value to all write plugins failed with status
>> -1.
>> ...
>> Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in
>> target `write': Dispatching value to all write plugins failed with status
>> -1.
>> Feb 24 01:55:10 controller3 kernel: [1398917.517473] eth0: no IPv6 routers
>> present
>> Feb 24 01:55:56 controller3 kernel: [1398962.889915] bnx2 0000:01:00.0: irq
>> 73 for MSI/MSI-X
>> Feb 24 01:55:56 controller3 kernel: [1398962.889924] bnx2 0000:01:00.0: irq
>> 74 for MSI/MSI-X
>> Feb 24 01:55:56 controller3 kernel: [1398962.889932] bnx2 0000:01:00.0: irq