Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - Python C binding or C API has password byte mishandling bug


Copy link to this message
-
Python C binding or C API has password byte mishandling bug
Ben Bangert 2012-08-22, 23:38
This was discovered due to testing session expiration per a prior Zookeeper users thread. Upon deeper investigation it seems that the second client is unable to connect to Zookeeper because it is supplying the wrong password. This results in the connection being dropped and the client getting a EXPIRED_SESSION_STATE, when its not actually expired, its a bad password.

I've changed the subject here to ensure this gets more attention for the bug it likely is.

I dropped additional debug statements into the C API to echo the password out, and the password it gets is not what Python sees. With one password the C binding got back, the Python API had *no password at all*, or it has less than 16 bytes, which is clearly a bug as the password is always 16 bytes.

I tweaked the C layer to output this after connection:
                    LOG_INFO(("Password is: %02x", zh->client_id.passwd));

Granted, I should probably use a better print modifier, I'm a C newb unfortunately. This is what I see in my logs during the problem:

ZooKeeper: INFO: check_events@1747: session establishment complete on server [127.0.0.1:20000], sessionId=0x139507b99fe00c6, negotiated timeout=10000
ZooKeeper: INFO: check_events@1748: Password is: 2502978
kazoo.testing: DEBUG: Password is:

Note, yes, there should be a password from the kazoo.testing line, its where I'm spitting out the hex password in Python. Problem is, there *is no password* available from Python. The Zookeeper server generates 16 random bytes for the password, and when it gets to Python... there's nothing.

This doesn't happen every time, which has me thinking that there is something in the C binding that is resulting in the mistranslation of the password when it contains certain characters getting truncated. I've noticed that the amount of characters in the Python one when the test fails is between 0-15 characters in length. Not the 16 it should be.

At this point, I don't know exactly where the bytes are being dropped. Digging through 2 layers is also very discouraging. Having a pure Python API to talk to Zookeeper has suddenly become much more attractive.

Cheers,
Ben