Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> cluster confused...would not delete node until cluster restarted...


Copy link to this message
-
cluster confused...would not delete node until cluster restarted...
My 3 node cluster would not let me delete a node saying it was not
empty...but stat showed that it was in fact empty:

[zk: localhost:2181(CONNECTED) 1] delete /ROOT_A/INSTANCES/
10.244.43.240/WORKERS
*Node not empty:* /ROOT_A/INSTANCES/10.244.43.240/WORKERS

[zk: localhost:2181(CONNECTED) 0] stat /ROOT_A/INSTANCES/
10.244.43.240/WORKERS
cZxid = 0xe0015ad31
ctime = Wed May 22 17:20:52 EDT 2013
mZxid = 0xe0015ad31
mtime = Wed May 22 17:20:52 EDT 2013
pZxid = 0xe0015ae3c
cversion = 2
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 24
*numChildren = 0*
*
*
*
*
The debug log showed this on the machine initiating the delete:
2013-05-23 09:05:37,030 [myid:1] - DEBUG
[FollowerRequestProcessor:1:CommitProcessor@171] - Processing request::
sessionid:0x13ed17dc4310000 type:delete cxid:0x3 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/ROOT_A/INSTANCES/10.244.43.240/WORKERS

2013-05-23 09:05:37,034 [myid:1] - DEBUG
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:CommitProcessor@161] - Committing
request:: sessionid:0x13ed17dc4310000 type:error cxid:0x3 zxid:0xe00200606
txntype:-1 reqpath:n/a

2013-05-23 09:05:37,034 [myid:1] - DEBUG
[CommitProcessor:1:FinalRequestProcessor@88] - Processing request::
sessionid:0x13ed17dc4310000 type:delete cxid:0x3 zxid:0xe00200606
txntype:-1 reqpath:/ROOT_A/INSTANCES/10.244.43.240/WORKERS
2013-05-23 09:05:37,034 [myid:1] - DEBUG [CommitProcessor:1:DataTree@949] -
*Ignoring processTxn failure hdr: -1 : error: -111*
*
*
And on another node I saw this:
2013-05-23 09:11:22,373 [myid:2] - INFO  [ProcessThread(sid:2
cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when
processing sessionid:0x13ed18295a60000 type:delete cxid:0x2
zxid:0xe0020060b txntype:-1 reqpath:n/a Error Path:/ROOT_A/INSTANCES/
10.244.43.240/WORKERS Error:KeeperErrorCode = Directory not empty for
/ROOT_A/INSTANCES/10.244.43.240/WORKERS*
*

Third node say nothing in response to the delete.

Problem went away after I serially restarted the first two machines so I'm
left with a "working" system and an uneasy feeling.  Was there anything I
could have done other than restart the servers?

Thanks.
--
http://about.me/BrianTarbox