Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Problem connecting to region server


Copy link to this message
-
Problem connecting to region server
Yi Liang 2012-02-29, 06:04
We're running hbase 0.90.3 with hadoop cdh3u2. Today, we ran into a problem
connecting to one region server.

When running hbase hbck, the following error appeared:
Number of Tables: 16
Number of live region servers: 20
Number of dead region servers: 0
.12/02/29 13:06:58 INFO ipc.HbaseRPC: Problem connecting to server: /
192.168.201.13:60020
ERROR: RegionServer: test13.xxx.com,60020,1327993969023 Unable to fetch
region information. java.net.SocketTimeoutException: Call to /
192.168.201.13:60020 failed on socket timeout exception:
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.201.13:44956remote=/
192.168.201.13:60020]

and the final status is INCONSISTENT. We have to kill the RS to recover the
status.

>From jstack output of that regionserver process, we saw the thread "IPC
Server listener on 60020" had been blocked. We have tried several times
in several minutes, but the state just kept as BLOCKED:

"IPC Server listener on 60020" daemon prio=10 tid=0x00007f983c57a800
nid=0x1b12 waiting for monitor entry [0x00007f98388f4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.registerChannel(HBaseServer.java:347)
        - waiting to lock <0x000000041d964510> (a
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doAccept(HBaseServer.java:496)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:422)

Had it caused the problem connecting to server? But why had it always been
BLOCKED?

Following is the RS log between the problem appeared and we killed the
process.

2012-02-29 12:06:12,117 INFO org.apache.hadoop.hbase.regionserver.Store:
Started compaction of 3 file(s) in cf=IndexInfo  into hdfs://
test02.xxx.com:30070/offline-hbase/News/4dae1f8cd991f17414ca4d86ff0884ad/.tmp,
seqid=423340578, totalSize=8.5m
2012-02-29 12:06:12,118 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Compacting hdfs://
test02.xxx.com:30070/offline-hbase/News/4dae1f8cd991f17414ca4d86ff0884ad/IndexInfo/8324806988914852495,
keycount=122337, bloomtype=NONE, size=8.4m
2012-02-29 12:06:12,118 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Compacting hdfs://
test02.xxx.com:30070/offline-hbase/News/4dae1f8cd991f17414ca4d86ff0884ad/IndexInfo/1116030618027381242,
keycount=258, bloomtype=NONE, size=17.7k
2012-02-29 12:06:12,118 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Compacting hdfs://
test02.xxx.com:30070/offline-hbase/News/4dae1f8cd991f17414ca4d86ff0884ad/IndexInfo/3755533953967637627,
keycount=372, bloomtype=NONE, size=25.8k
2012-02-29 12:06:12,906 INFO org.apache.hadoop.hbase.regionserver.Store:
Completed major compaction of 3 file(s), new file=hdfs://
test02.xxx.com:30070/offline-hbase/News/4dae1f8cd991f17414ca4d86ff0884ad/IndexInfo/3731399222200436246,
size=8.5m; total size for store is 8.5m
2012-02-29 12:06:12,906 INFO org.apache.hadoop.hbase.regionserver.HRegion:
completed compaction on region
News,57addda034c334e4,1313319088489.4dae1f8cd991f17414ca4d86ff0884ad. after
7sec
2012-02-29 12:07:26,577 INFO org.apache.hadoop.hdfs.DFSClient: Could not
obtain block blk_1313036207534951503_65938873 from any node:
java.io.IOException: No live nodes contain current block. Will get new
block locations from namenode and retry...
2012-02-29 12:07:50,103 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.49 GB,
free=646.64 MB, max=3.12 GB, blocks=29616, accesses=80631725,
hits=60715195, hitRatio=75.29%%, cachingAccesses=72673671,
cachingHits=59497193, cachingHitsRatio=81.86%%, evictions=3584,
evicted=13146860, evictedPerRun=3668.208740234375
2012-02-29 12:12:50,103 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.62 GB,
free=518.67 MB, max=3.12 GB, blocks=30943, accesses=80640574,
hits=60722719, hitRatio=75.30%%, cachingAccesses=72682520,
cachingHits=59504717, cachingHitsRatio=81.86%%, evictions=3584,
evicted=13146860, evictedPerRun=3668.208740234375
2012-02-29 12:15:06,937 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 319.74 MB of total=2.65 GB
2012-02-29 12:15:06,955 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
completed; freed=319.87 MB, total=2.34 GB, single=744.45 MB, multi=1.9 GB,
memory=0 KB
2012-02-29 12:17:50,103 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.45 GB,
free=692.03 MB, max=3.12 GB, blocks=28911, accesses=80652333,
hits=60732703, hitRatio=75.30%%, cachingAccesses=72694279,
cachingHits=59514701, cachingHitsRatio=81.86%%, evictions=3585,
evicted=13150645, evictedPerRun=3668.2412109375
2012-02-29 12:18:52,867 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /192.168.201.23:50010 for file
/offline-hbase/News/bec970594146b62ddf8bd450fc654acf/Content/621772312284239615
for block -1994034029269165490:java.net.SocketTimeoutException: 60000
millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.201.13:48546remote=/
192.168.201.23:50010]
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        at java.io.DataInputStream.readShort(DataInputStream.java:295)
        at
org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1462)
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:2024)
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2099)
        at