Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> scanner lease expired/region server shutdown


Copy link to this message
-
scanner lease expired/region server shutdown
I was doing some brute force testing - running one instance of PerformanceEvaluation (PE) for writes, and another instance for randomReads.

One of the region servers went down after a while. [This is on 0.20.2]. The region server log had things like:

2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog; editcount=27878
2010-01-25 11:33:39,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
 Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit Ratio=85.40924191474\
915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog; editcount=27879
2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog; editcount=27880
2010-01-25 11:33:39,417 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -5220035164735014996 lease expired

This was followed by:
2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708<mailto:sun.nio.ch.SelectionKeyImpl@788ab708>
java.io.IOException: TIMED OUT

And also errors like:
2010-01-25 11:33:39,692 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996

Eventually, the region server shutdown.
2010-01-25 11:33:40,293 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 85 on 60020, call get([B@2eec23d2, row=0007555231, maxVersions=1, timeRange=[0,9223372036854775807), families={(family=info, columns={data}}) from 10.131.1.186:51645: error: java.io.IOException: Server not running, aborting

Looks like for some reason the region server got overloaded, and was not able to finish the scan (I am assuming this is the period meta/root region scan initiated by the master).

Any thoughts/ideas on the above?

Subsequently, I restarted the region server and just started the PE in sequentialWrite mode. It seems to be off to a really slow start. For several regions, it is printing the following:

10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
[..and so on..]
10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 because: region offline: TestTable,0031841132,1264121653303
10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999

When I scan the '.META.', I noticed that most of the regions for which I get the above error were in the middle of (?) a split. For example:

TestTable,0031841132,126412 column=info:regioninfo, timestamp=1264470099796, value=REGION => {NAME => 'TestTa
 1653303                     ble,0031841132,1264121653303', STARTKEY => '0031841132', ENDKEY => '0032057774',
                             ENCODED => 644500792, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'TestTab
                             le', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
                             }}
 TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, value=10.129.68.214:60020
 1653303
 TestTable,0031841132,126412 column=info:serverstartcode, timestamp=1264121655148, value=1264109117245
 1653303
 TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, value=\000\n0031978284\000\000\000\0
 1653303                     01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
                             5\337\274#
 TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, value=\000\n0032057774\000\000\000\0
 1653303                     01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
                             05false\000\000\000
+
Jean-Daniel Cryans 2010-01-26, 03:23
+
Stack 2010-01-26, 04:59
+
Kannan Muthukkaruppan 2010-01-26, 15:01
+
Kannan Muthukkaruppan 2010-01-26, 15:10
+
Stack 2010-01-26, 16:31
+
Kannan Muthukkaruppan 2010-01-26, 20:56
+
Stack 2010-01-27, 00:03
+
Dhruba Borthakur 2010-01-26, 18:10
+
Stack 2010-01-26, 18:24
+
Kannan Muthukkaruppan 2010-01-26, 20:58
+
Jean-Daniel Cryans 2010-01-26, 21:53
+
Kannan Muthukkaruppan 2010-01-27, 20:27
+
Stack 2010-01-26, 17:20
+
Kannan Muthukkaruppan 2010-01-27, 22:30
+
Stack 2010-01-27, 22:49
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB