-Re: Problem connecting to region server
Yi Liang 2012-03-01, 05:12
The thread holding the lock:
"IPC Reader 0 on port 60020" prio=10 tid=0x00007f983c1aa800 nid=0x1ae9
waiting on condition [0x00007f983a915000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000041d9f51f0> (a
- locked <0x000000041d964510> (a
I also put the whole dump here: http://pastebin.com/f9BcrXUP
About the socked timeout exceptions in rs log, we actually saw them before,
sometime likely caused by datanode block report, but they had never caused
the region server lost response. I will have a look at the datanode log to
double check, what does "maxing your disks" means here?
On Thu, Mar 1, 2012 at 3:42 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
> There's a lot going in there and considering that I don't know if your
> selection if thread dumps/logs is the right one, my suggestions might
> be wrong.
> So in that thread dump the Listener thread is blocked on
> 0x000000041d964510, have you searched which thread holds it?
> Most of the time (almost 100% in my experience), getting the socket
> timeout client-side means you need to look at the "IPC Server handler"
> threads in the dump since this is where the client queries are
> Regarding your log, it's getting socket timeouts from the
> Datanode-side. Were you maxing your disks? What was going there?
> Hope this helps,
> On Tue, Feb 28, 2012 at 10:04 PM, Yi Liang <[EMAIL PROTECTED]> wrote:
> > We're running hbase 0.90.3 with hadoop cdh3u2. Today, we ran into a
> > connecting to one region server.
> > When running hbase hbck, the following error appeared:
> > Number of Tables: 16
> > Number of live region servers: 20
> > Number of dead region servers: 0
> > .12/02/29 13:06:58 INFO ipc.HbaseRPC: Problem connecting to server: /
> > 192.168.201.13:60020
> > ERROR: RegionServer: test13.xxx.com,60020,1327993969023 Unable to fetch
> > region information. java.net.SocketTimeoutException: Call to /
> > 192.168.201.13:60020 failed on socket timeout exception:
> > java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/192.168.201.13:44956
> > 192.168.201.13:60020]
> > and the final status is INCONSISTENT. We have to kill the RS to recover
> > status.
> > From jstack output of that regionserver process, we saw the thread "IPC
> > Server listener on 60020" had been blocked. We have tried several times
> > in several minutes, but the state just kept as BLOCKED:
> > "IPC Server listener on 60020" daemon prio=10 tid=0x00007f983c57a800
> > nid=0x1b12 waiting for monitor entry [0x00007f98388f4000]
> > java.lang.Thread.State: BLOCKED (on object monitor)
> > at
> > - waiting to lock <0x000000041d964510> (a