Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # dev - Possible bug - xid mismatch, out of order responses, "unexpected server response: expected 0x..., but recieved 0x..."


Copy link to this message
-
Re: Possible bug - xid mismatch, out of order responses, "unexpected server response: expected 0x..., but recieved 0x..."
Dutch Meyer 2014-01-15, 16:55
Just to close the loop here - this is a race in upstream zookeeper.  I'll
open a jira ticket today.

In the commit processor, if we are at the primary request handler on line
167:

                while (!stopped && !isWaitingForCommit() &&
                       !isProcessingCommit() &&
                       (request = queuedRequests.poll()) != null) {
                    if (needCommit(request)) {
                        nextPending.set(request);
                    } else {
                        sendToNextProcessor(request);
                    }
                }

A request can be handled in this block and be quickly processed and
completed on another thread. If queuedRequests is empty, we then exit the
block. Next, before this thread makes any more progress, we can get 2 more
requests, one get_children(say), and a sync placed on queuedRequests for
the processor. Then, if we are very unlucky, the sync request can complete
and this object's commit() routine is called (from
FollowerZookeeperServer), which places the sync request on the previously
empty committedRequests queue. At that point, this thread continues.

We reach line 182, which is a check on sync requests.

                if (!stopped && !isProcessingRequest() &&
                    (request = committedRequests.poll()) != null) {

Here we are not processing any requests, because the original request above
has completed. We haven't dequeued either the read or the sync request in
this processor. Next, the poll above will pull the sync request off the
queue, and in the following block, the sync will get forwarded to the next
processor.

This is a problem because the read request hasn't been forwarded yet, so
requests are now out of order.

--Dutch

On Mon, Jan 13, 2014 at 3:16 PM, Dutch Meyer <[EMAIL PROTECTED]> wrote:

> I have another instance of this issue while running at DEBUG trace level.
>
>
> ----------------------------------------------------------------------------------------------------------------------------
> 2014-01-01 12:55:12,278 - WARN  [NIOWorkerThread-4:NIOServerCnxn@362][] -
> Unable to read additional data from client sessionid 0x200000209eb0002,
> likely client has closed socket
> 2014-01-01 12:55:12,278 - INFO  [NIOWorkerThread-4:NIOServerCnxn@1000][]
> - Closed socket connection for client /10.11.19.2:54313 which had
> sessionid 0x200000209eb0002
> 2014-01-01 12:55:12,279 - DEBUG [NIOWorkerThread-4:NIOServerCnxn@1036][]
> - ignoring exception during input shutdown
> java.net.SocketException: Transport endpoint is not connected
>         at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:658)
>         at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:378)
>         at
> org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1032)
>         at
> org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1005)
>         at
> org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:989)
>         at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:364)
>         at
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:530)
>         at
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:152)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:679)
> 2014-01-01 12:55:12,799 - INFO
>  [NIOServerCxnFactory.AcceptThread:/10.11.19.2:2181
> :NIOServerCnxnFactory$AcceptThread@296][] - Accepted socket connection
> from /10.11.19.2:54326
> 2014-01-01 12:55:12,799 - DEBUG [NIOWorkerThread-1:ZooKeeperServer@761][]
> - Session establishment request from client /10.11.19.2:54326 client's
> lastZxid is 0x2000003b3
> 2014-01-01 12:55:12,800 - INFO  [NIOWorkerThread-1:ZooKeeperServer@816][]

-=-Dutch