Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Long waiting loop for " Waiting for region servers count to settle" when doing hmaster failover


Copy link to this message
-
Long waiting loop for " Waiting for region servers count to settle" when doing hmaster failover
Hi Commnunity,

When I do a testing, I met this issue on 0.94.3.

There are 1 active hmaster, 1 backup hmaster, 4 region servers.
I run YCSB workload on it to load data. During the running of workload,
I manually kill -9 the active hmaster, seems that backup master took
over the active role quickly, but looping on

"
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 0, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 0, slept for xxx ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 0, slept for xxx ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
...
...
...
<for about 5 - 7 mins looping on this log message>
...

INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 1, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.

INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 2, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 3, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 4, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.

"
It seems there always a looping of 5 - 7 mins for the above waiting
message for region servers to checked in to the new active master. Then
after a long wait loop, it suddenly checked in 4 region servers
successfully.

Any idea of this waiting loop? Thanks a lot for the advice~
-- Best Regards, Julian