|
|
-
Problem connecting to region serverYi Liang 2012-02-29, 06:04
We're running hbase 0.90.3 with hadoop cdh3u2. Today, we ran into a problem
connecting to one region server. When running hbase hbck, the following error appeared: Number of Tables: 16 Number of live region servers: 20 Number of dead region servers: 0 .12/02/29 13:06:58 INFO ipc.HbaseRPC: Problem connecting to server: / 192.168.201.13:60020 ERROR: RegionServer: test13.xxx.com,60020,1327993969023 Unable to fetch region information. java.net.SocketTimeoutException: Call to / 192.168.201.13:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.201.13:44956remote=/ 192.168.201.13:60020] and the final status is INCONSISTENT. We have to kill the RS to recover the status. >From jstack output of that regionserver process, we saw the thread "IPC Server listener on 60020" had been blocked. We have tried several times in several minutes, but the state just kept as BLOCKED: "IPC Server listener on 60020" daemon prio=10 tid=0x00007f983c57a800 nid=0x1b12 waiting for monitor entry [0x00007f98388f4000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.registerChannel(HBaseServer.java:347) - waiting to lock <0x000000041d964510> (a org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doAccept(HBaseServer.java:496) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:422) Had it caused the problem connecting to server? But why had it always been BLOCKED? Following is the RS log between the problem appeared and we killed the process. 2012-02-29 12:06:12,117 INFO org.apache.hadoop.hbase.regionserver.Store: Started compaction of 3 file(s) in cf=IndexInfo into hdfs:// test02.xxx.com:30070/offline-hbase/News/4dae1f8cd991f17414ca4d86ff0884ad/.tmp, seqid=423340578, totalSize=8.5m 2012-02-29 12:06:12,118 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compacting hdfs:// test02.xxx.com:30070/offline-hbase/News/4dae1f8cd991f17414ca4d86ff0884ad/IndexInfo/8324806988914852495, keycount=122337, bloomtype=NONE, size=8.4m 2012-02-29 12:06:12,118 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compacting hdfs:// test02.xxx.com:30070/offline-hbase/News/4dae1f8cd991f17414ca4d86ff0884ad/IndexInfo/1116030618027381242, keycount=258, bloomtype=NONE, size=17.7k 2012-02-29 12:06:12,118 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compacting hdfs:// test02.xxx.com:30070/offline-hbase/News/4dae1f8cd991f17414ca4d86ff0884ad/IndexInfo/3755533953967637627, keycount=372, bloomtype=NONE, size=25.8k 2012-02-29 12:06:12,906 INFO org.apache.hadoop.hbase.regionserver.Store: Completed major compaction of 3 file(s), new file=hdfs:// test02.xxx.com:30070/offline-hbase/News/4dae1f8cd991f17414ca4d86ff0884ad/IndexInfo/3731399222200436246, size=8.5m; total size for store is 8.5m 2012-02-29 12:06:12,906 INFO org.apache.hadoop.hbase.regionserver.HRegion: completed compaction on region News,57addda034c334e4,1313319088489.4dae1f8cd991f17414ca4d86ff0884ad. after 7sec 2012-02-29 12:07:26,577 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_1313036207534951503_65938873 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry... 2012-02-29 12:07:50,103 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.49 GB, free=646.64 MB, max=3.12 GB, blocks=29616, accesses=80631725, hits=60715195, hitRatio=75.29%%, cachingAccesses=72673671, cachingHits=59497193, cachingHitsRatio=81.86%%, evictions=3584, evicted=13146860, evictedPerRun=3668.208740234375 2012-02-29 12:12:50,103 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.62 GB, free=518.67 MB, max=3.12 GB, blocks=30943, accesses=80640574, hits=60722719, hitRatio=75.30%%, cachingAccesses=72682520, cachingHits=59504717, cachingHitsRatio=81.86%%, evictions=3584, evicted=13146860, evictedPerRun=3668.208740234375 2012-02-29 12:15:06,937 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 319.74 MB of total=2.65 GB 2012-02-29 12:15:06,955 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=319.87 MB, total=2.34 GB, single=744.45 MB, multi=1.9 GB, memory=0 KB 2012-02-29 12:17:50,103 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.45 GB, free=692.03 MB, max=3.12 GB, blocks=28911, accesses=80652333, hits=60732703, hitRatio=75.30%%, cachingAccesses=72694279, cachingHits=59514701, cachingHitsRatio=81.86%%, evictions=3585, evicted=13150645, evictedPerRun=3668.2412109375 2012-02-29 12:18:52,867 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /192.168.201.23:50010 for file /offline-hbase/News/bec970594146b62ddf8bd450fc654acf/Content/621772312284239615 for block -1994034029269165490:java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.201.13:48546remote=/ 192.168.201.23:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readShort(DataInputStream.java:295) at org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1462) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:2024) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2099) at |