On Wed, Sep 12, 2012 at 10:40 AM, Tom Brown <[EMAIL PROTECTED]> wrote:
> I have captured some logs from what is happening during one of these pauses.
> Can someone help me figure out what's actually going on from these logs?
> --- My interpretation of the logs ---
> As you can see at the start of the logs, my coprocessor for updating
> the data is executing rapidly until 10:17:06.
> At that time the coprocessor for querying is invoked. This query
> should take only moments to return, but doesn't return until 10:44:52.
Here it would be helpful to get a stacktrace from the regionserver
where the CP is executing, to see where the RPC threads servicing the
CP invocations are hung up.
> At 10:18:53 there appear to be some compaction related messages
> (though they didn't appear to be the cause, happening over a minute
> after the server stops functioning).
> It appears to run compaction until 10:42:25. The next two minutes
> contain just LRU eviction messages.
> At 10:44:52, the query from earlier appears to complete, after having
> summarized only 863 rows. A few other queued requests are attempted,
> but fail with exceptions (ClosedChannelException).
> Eventually the exceptions are being thrown from "openScanner", which
> really doesn't sound good to me.
ChannelClosedExceptions appear to be from RPC service threads, now
unstuck, processing queued up CP invocations but the client has given
up, so they can't write back results and error out.
Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)