|
ac@...)
2012-11-21, 11:16
ac@...)
2012-11-21, 13:29
Jean-Marc Spaggiari
2012-11-21, 17:22
ac@...)
2012-11-21, 23:13
Jean-Marc Spaggiari
2012-11-21, 23:51
ac@...)
2012-11-22, 00:14
Jean-Marc Spaggiari
2012-11-22, 00:39
ac@...)
2012-11-22, 01:53
|
-
A region server stopped (timeout after trying to connect local Zookeeper)ac@...) 2012-11-21, 11:16
Hi,
Please help!! HBase version: 0.94 ZooKeeper: 3.4.4 One of the regional servers stopped very quickly after HBASE is started: ### Check JPS after HBASE cluster was started, could find the HRegionServer process (*** there is no any ZooKeeper instance running in this server ***) $ jps 24767 Jps 18418 TaskTracker 24678 HRegionServer 18156 DataNode ### Wait a while and checked JPS again, HRegionServer process gone $ jps 18418 TaskTracker 24784 Jps 18156 DataNode ### Here is the setting in hbase-site.xml ( enabled hbase.cluster.distributed, set up 3 ZooKeepers, timeout= 60000) <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.ZooKeeper.quorum</name> <value>m146,m145,m143</value> </property> <property> <name>zookeeper.session.timeout</name> <value>60000</value> </property> ### hbase-env.sh also tells HBASE not to manage local instance of ZooKeeper export HBASE_MANAGES_ZK=false ###This server can connect to the 3 ZooKeepers, ./zkCli.sh -server m145,m146,m143 ==> [zk: m145,m146,m143(CONNECTED) 0] ### checked the hbase log file, found something odd, seemed that it tried to connect local ZooKeeper 2012-11-21 17:30:33,066 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=regionserver:60020 2012-11-21 17:31:33,254 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master 2012-11-21 17:31:33,254 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 2000ms before retry #1... 2012-11-21 17:32:33,262 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60010ms for sessionid 0x0, closing socket connection and attempting reconnect 2012-11-21 17:32:33,362 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master ...... 2012-11-21 17:34:33,570 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries 2012-11-21 17:34:33,571 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:60020 Unable to set watcher on znode /hbase/master 2012-11-21 17:34:33,573 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020 Received unexpected KeeperException, re-throwing exception 2012-11-21 17:34:33,573 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server ...... 2012-11-21 17:34:33,576 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [] 2012-11-21 17:34:36,580 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server m144,60020,1353490232962: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:623) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:598) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:560) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:669) at java.lang.Thread.run(Thread.java:662) 2012-11-21 17:34:36,581 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [] Please help! QUESTION: Is it a bug and I need to check something else? Thanks
-
Re: A region server stopped (timeout after trying to connect local Zookeeper)ac@...) 2012-11-21, 13:29
Hi,
I have the following line in /etc/hosts in all servers, should I keep it or comment it out or ...? 127.0.0.1 localhost Please help. Thanks On 21 Nov 2012, at 7:16 PM, [EMAIL PROTECTED] wrote: > Hi, > > > Please help!! > > HBase version: 0.94 > ZooKeeper: 3.4.4 > > One of the regional servers stopped very quickly after HBASE is started: > > ### Check JPS after HBASE cluster was started, could find the HRegionServer process (*** there is no any ZooKeeper instance running in this server ***) > $ jps > 24767 Jps > 18418 TaskTracker > 24678 HRegionServer > 18156 DataNode > > ### Wait a while and checked JPS again, HRegionServer process gone > $ jps > 18418 TaskTracker > 24784 Jps > 18156 DataNode > > > ### Here is the setting in hbase-site.xml ( enabled hbase.cluster.distributed, set up 3 ZooKeepers, timeout= 60000) > <property> > <name>hbase.cluster.distributed</name> > <value>true</value> > </property> > > <property> > <name>hbase.ZooKeeper.quorum</name> > <value>m146,m145,m143</value> > </property> > > <property> > <name>zookeeper.session.timeout</name> > <value>60000</value> > </property> > > > ### hbase-env.sh also tells HBASE not to manage local instance of ZooKeeper > export HBASE_MANAGES_ZK=false > > > ###This server can connect to the 3 ZooKeepers, > ./zkCli.sh -server m145,m146,m143 ==> [zk: m145,m146,m143(CONNECTED) 0] > > > ### checked the hbase log file, found something odd, seemed that it tried to connect local ZooKeeper > 2012-11-21 17:30:33,066 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=regionserver:60020 > > 2012-11-21 17:31:33,254 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master > > 2012-11-21 17:31:33,254 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 2000ms before retry #1... > 2012-11-21 17:32:33,262 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60010ms for sessionid 0x0, closing socket connection and attempting reconnect > > 2012-11-21 17:32:33,362 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master > > ...... > > 2012-11-21 17:34:33,570 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries > 2012-11-21 17:34:33,571 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:60020 Unable to set watcher on znode /hbase/master > 2012-11-21 17:34:33,573 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020 Received unexpected KeeperException, re-throwing exception > 2012-11-21 17:34:33,573 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server ...... > 2012-11-21 17:34:33,576 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [] > > 2012-11-21 17:34:36,580 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server m144,60020,1353490232962: Initialization of RS failed. Hence aborting RS. > java.io.IOException: Received the shutdown message while waiting. > at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:623) > at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:598) > at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:560) > at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:669) > at java.lang.Thread.run(Thread.java:662) > 2012-11-21 17:34:36,581 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [] > > > Please help! > QUESTION: Is it a bug and I need to check something else?
-
Re: A region server stopped (timeout after trying to connect local Zookeeper)Jean-Marc Spaggiari 2012-11-21, 17:22
Hi,
What do you have on your HBase configuration? Are you passing the name of the Quorum servers? $ cat conf/hbase-site.xml ...... </property> <property> <name>hbase.zookeeper.quorum</name> <value>cube,latitude,node3</value> <description>Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. </description> </property> ..... 2012/11/21, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: > Hi, > > > I have the following line in /etc/hosts in all servers, should I keep it or > comment it out or ...? > > 127.0.0.1 localhost > > Please help. > > Thanks > > > > On 21 Nov 2012, at 7:16 PM, [EMAIL PROTECTED] wrote: > >> Hi, >> >> >> Please help!! >> >> HBase version: 0.94 >> ZooKeeper: 3.4.4 >> >> One of the regional servers stopped very quickly after HBASE is started: >> >> ### Check JPS after HBASE cluster was started, could find the >> HRegionServer process (*** there is no any ZooKeeper instance running in >> this server ***) >> $ jps >> 24767 Jps >> 18418 TaskTracker >> 24678 HRegionServer >> 18156 DataNode >> >> ### Wait a while and checked JPS again, HRegionServer process gone >> $ jps >> 18418 TaskTracker >> 24784 Jps >> 18156 DataNode >> >> >> ### Here is the setting in hbase-site.xml ( enabled >> hbase.cluster.distributed, set up 3 ZooKeepers, timeout= 60000) >> <property> >> <name>hbase.cluster.distributed</name> >> <value>true</value> >> </property> >> >> <property> >> <name>hbase.ZooKeeper.quorum</name> >> <value>m146,m145,m143</value> >> </property> >> >> <property> >> <name>zookeeper.session.timeout</name> >> <value>60000</value> >> </property> >> >> >> ### hbase-env.sh also tells HBASE not to manage local instance of >> ZooKeeper >> export HBASE_MANAGES_ZK=false >> >> >> ###This server can connect to the 3 ZooKeepers, >> ./zkCli.sh -server m145,m146,m143 ==> [zk: m145,m146,m143(CONNECTED) >> 0] >> >> >> ### checked the hbase log file, found something odd, seemed that it tried >> to connect local ZooKeeper >> 2012-11-21 17:30:33,066 INFO org.apache.zookeeper.ZooKeeper: Initiating >> client connection, connectString=localhost:2181 sessionTimeout=60000 >> watcher=regionserver:60020 >> >> 2012-11-21 17:31:33,254 WARN >> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient >> ZooKeeper exception: >> org.apache.zookeeper.KeeperException$ConnectionLossException: >> KeeperErrorCode = ConnectionLoss for /hbase/master >> >> 2012-11-21 17:31:33,254 INFO org.apache.hadoop.hbase.util.RetryCounter: >> Sleeping 2000ms before retry #1... >> 2012-11-21 17:32:33,262 INFO org.apache.zookeeper.ClientCnxn: Client >> session timed out, have not heard from server in 60010ms for sessionid >> 0x0, closing socket connection and attempting reconnect >> >> 2012-11-21 17:32:33,362 WARN >> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient >> ZooKeeper exception: >> org.apache.zookeeper.KeeperException$ConnectionLossException: >> KeeperErrorCode = ConnectionLoss for /hbase/master >> >> ...... >> >> 2012-11-21 17:34:33,570 ERROR >> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists >> failed after 3 retries >> 2012-11-21 17:34:33,571 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: >> regionserver:60020 Unable to set watcher on znode /hbase/master >> 2012-11-21 17:34:33,573 ERROR >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020 >> Received unexpected KeeperException, re-throwing exception >> 2012-11-21 17:34:33,573 FATAL >> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server >> ...... >> 2012-11-21 17:34:33,576 FATAL >> org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort:
-
Re: A region server stopped (timeout after trying to connect local Zookeeper)ac@...) 2012-11-21, 23:13
Hi,
Here are my HBase configuration and test: 1) {$HBASE_HOME}hbase/conf/hbase-site.xml <property> <name>hbase.ZooKeeper.quorum</name> <value>m146,m145,m143</value> </property> <property> <name>zookeeper.session.timeout</name> <value>60000</value> </property> 2) {$HBASE_HOME}hbase/conf/hbase-env.sh export HBASE_MANAGES_ZK=false 3) I used " {$ZK_HOME}/bin/zkCli.sh -server m145,m146,m143" to test the connection, it worked [zk: m145,m146,m143(CONNECTED) 0] 4) from the logs, I found that the connectString was odd, the RegionServer did not use the setting of "hbase.ZooKeeper.quorum" in conf/hbase-site.xml, it seemed that it always used the default and tried to connect "localhost:2181" in the distributed cluster: 2012-11-21 17:21:42,299 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=regionserver:60020 ... 2012-11-21 17:21:42,313 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configura$ ... 2012-11-21 17:21:42,316 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused ... (remark: it tried above 3 times, then had FATAL error as follows) 2012-11-21 17:21:57,846 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020 Received unexpected KeeperException, re-throwing exception ... 2012-11-21 17:21:57,847 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server ... Please help. Thanks On 22 Nov 2012, at 1:22 AM, Jean-Marc Spaggiari wrote: > Hi, > > What do you have on your HBase configuration? Are you passing the name > of the Quorum servers? > $ cat conf/hbase-site.xml > ...... > </property> > <property> > <name>hbase.zookeeper.quorum</name> > <value>cube,latitude,node3</value> > <description>Comma separated list of servers in the ZooKeeper Quorum. > For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". > By default this is set to localhost for local and pseudo-distributed modes > of operation. For a fully-distributed setup, this should be set to a full > list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in > hbase-env.sh > this is the list of servers which we will start/stop ZooKeeper on. > </description> > </property> > ..... > > 2012/11/21, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: >> Hi, >> >> >> I have the following line in /etc/hosts in all servers, should I keep it or >> comment it out or ...? >> >> 127.0.0.1 localhost >> >> Please help. >> >> Thanks >> >> >> >> On 21 Nov 2012, at 7:16 PM, [EMAIL PROTECTED] wrote: >> >>> Hi, >>> >>> >>> Please help!! >>> >>> HBase version: 0.94 >>> ZooKeeper: 3.4.4 >>> >>> One of the regional servers stopped very quickly after HBASE is started: >>> >>> ### Check JPS after HBASE cluster was started, could find the >>> HRegionServer process (*** there is no any ZooKeeper instance running in >>> this server ***) >>> $ jps >>> 24767 Jps >>> 18418 TaskTracker >>> 24678 HRegionServer >>> 18156 DataNode >>> >>> ### Wait a while and checked JPS again, HRegionServer process gone >>> $ jps >>> 18418 TaskTracker >>> 24784 Jps >>> 18156 DataNode >>> >>> >>> ### Here is the setting in hbase-site.xml ( enabled >>> hbase.cluster.distributed, set up 3 ZooKeepers, timeout= 60000) >>> <property> >>> <name>hbase.cluster.distributed</name> >>> <value>true</value> >>> </property> >>> >>> <property> >>> <name>hbase.ZooKeeper.quorum</name> >>> <value>m146,m145,m143</value> >>> </property> >>> >>> <property> >>> <name>zookeeper.session.timeout</name> >>> <value>60000</value> >>> </property> >>> >>> >>> ### hbase-env.sh also tells HBASE not to manage local instance of >>> ZooKeeper >>> export HBASE_MANAGES_ZK=false >>> >>> >>> ###This server can connect to the 3 ZooKeepers,
-
Re: A region server stopped (timeout after trying to connect local Zookeeper)Jean-Marc Spaggiari 2012-11-21, 23:51
Can you do JPS on your master and look at the logs too?
Another think, can you try with hbase.zookeeper.quorum instead of hbase.ZooKeeper.quorum? 2012/11/21, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: > Hi, > > Here are my HBase configuration and test: > > 1) {$HBASE_HOME}hbase/conf/hbase-site.xml > <property> > <name>hbase.ZooKeeper.quorum</name> > <value>m146,m145,m143</value> > </property> > > <property> > <name>zookeeper.session.timeout</name> > <value>60000</value> > </property> > > > 2) {$HBASE_HOME}hbase/conf/hbase-env.sh > export HBASE_MANAGES_ZK=false > > > 3) I used " {$ZK_HOME}/bin/zkCli.sh -server m145,m146,m143" to test the > connection, it worked > [zk: m145,m146,m143(CONNECTED) 0] > > > 4) from the logs, I found that the connectString was odd, the RegionServer > did not use the setting of "hbase.ZooKeeper.quorum" in conf/hbase-site.xml, > it seemed that it always used the default and tried to connect > "localhost:2181" in the distributed cluster: > > 2012-11-21 17:21:42,299 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=localhost:2181 sessionTimeout=60000 > watcher=regionserver:60020 > ... > 2012-11-21 17:21:42,313 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server localhost/127.0.0.1:2181. Will not attempt to > authenticate using SASL (Unable to locate a login configura$ > ... > 2012-11-21 17:21:42,316 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 > for server null, unexpected error, closing socket connection and attempting > reconnect java.net.ConnectException: Connection refused > ... (remark: it tried above 3 times, then had FATAL error as follows) > > 2012-11-21 17:21:57,846 ERROR > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020 > Received unexpected KeeperException, re-throwing exception > ... > 2012-11-21 17:21:57,847 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > ... > > > > Please help. > > Thanks > > > > > > On 22 Nov 2012, at 1:22 AM, Jean-Marc Spaggiari wrote: > >> Hi, >> >> What do you have on your HBase configuration? Are you passing the name >> of the Quorum servers? >> $ cat conf/hbase-site.xml >> ...... >> </property> >> <property> >> <name>hbase.zookeeper.quorum</name> >> <value>cube,latitude,node3</value> >> <description>Comma separated list of servers in the ZooKeeper >> Quorum. >> For example, >> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". >> By default this is set to localhost for local and pseudo-distributed >> modes >> of operation. For a fully-distributed setup, this should be set to a >> full >> list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in >> hbase-env.sh >> this is the list of servers which we will start/stop ZooKeeper on. >> </description> >> </property> >> ..... >> >> 2012/11/21, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: >>> Hi, >>> >>> >>> I have the following line in /etc/hosts in all servers, should I keep it >>> or >>> comment it out or ...? >>> >>> 127.0.0.1 localhost >>> >>> Please help. >>> >>> Thanks >>> >>> >>> >>> On 21 Nov 2012, at 7:16 PM, [EMAIL PROTECTED] wrote: >>> >>>> Hi, >>>> >>>> >>>> Please help!! >>>> >>>> HBase version: 0.94 >>>> ZooKeeper: 3.4.4 >>>> >>>> One of the regional servers stopped very quickly after HBASE is >>>> started: >>>> >>>> ### Check JPS after HBASE cluster was started, could find the >>>> HRegionServer process (*** there is no any ZooKeeper instance running >>>> in >>>> this server ***) >>>> $ jps >>>> 24767 Jps >>>> 18418 TaskTracker >>>> 24678 HRegionServer >>>> 18156 DataNode >>>> >>>> ### Wait a while and checked JPS again, HRegionServer process gone >>>> $ jps >>>> 18418 TaskTracker >>>> 24784 Jps >>>> 18156 DataNode >>>> >>>> >>>> ### Here is the setting in hbase-site.xml ( enabled >>>> hbase.cluster.distributed, set up 3 ZooKeepers, timeout= 60000) >>>> <property> >>>> <name>hbase.cluster.distributed</name> >>>> <value>true</value> >>>> </property> >>>
-
Re: A region server stopped (timeout after trying to connect local Zookeeper)ac@...) 2012-11-22, 00:14
Hi
I changed the order of ZooKeepers in the value of hbase.zookeeper.quorum, from "m146,m145,m143" to "m143,m145,m146", set timeout from 60000 to 70000, and commented out lzo property. it works now, here is the diff 1) $ diff hbase-site.xml hbase-site.xml.xxx 41,44c41,43 < < <property> < <name>hbase.zookeeper.quorum</name> < <value>m143,m145,m146</value> --- > <property> > <name>hbase.ZooKeeper.quorum</name> > <value>m146,m145,m143</value> 49c48,55 < <value>70000</value> --- > <value>60000</value> > </property> > > <!-- > /** > <property> > <name>hbase.regionserver.codecs</name> > <value>lzo,gz</value> 50a57,58 > **/ > --> Above is the only change today . 2) hbase log: 2012-11-22 07:26:19,431 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=m145:2181,m143:2181,m146:2181 sessionTimeout=70000 watcher=regionserver:6$ I don't know why but it works now. It seems that hbase somehow could not read in hbase-site.xml correctly. Thanks On 22 Nov 2012, at 7:51 AM, Jean-Marc Spaggiari wrote: > Can you do JPS on your master and look at the logs too? > > Another think, can you try with hbase.zookeeper.quorum instead of > hbase.ZooKeeper.quorum? > > 2012/11/21, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: >> Hi, >> >> Here are my HBase configuration and test: >> >> 1) {$HBASE_HOME}hbase/conf/hbase-site.xml >> <property> >> <name>hbase.ZooKeeper.quorum</name> >> <value>m146,m145,m143</value> >> </property> >> >> <property> >> <name>zookeeper.session.timeout</name> >> <value>60000</value> >> </property> >> >> >> 2) {$HBASE_HOME}hbase/conf/hbase-env.sh >> export HBASE_MANAGES_ZK=false >> >> >> 3) I used " {$ZK_HOME}/bin/zkCli.sh -server m145,m146,m143" to test the >> connection, it worked >> [zk: m145,m146,m143(CONNECTED) 0] >> >> >> 4) from the logs, I found that the connectString was odd, the RegionServer >> did not use the setting of "hbase.ZooKeeper.quorum" in conf/hbase-site.xml, >> it seemed that it always used the default and tried to connect >> "localhost:2181" in the distributed cluster: >> >> 2012-11-21 17:21:42,299 INFO org.apache.zookeeper.ZooKeeper: Initiating >> client connection, connectString=localhost:2181 sessionTimeout=60000 >> watcher=regionserver:60020 >> ... >> 2012-11-21 17:21:42,313 INFO org.apache.zookeeper.ClientCnxn: Opening >> socket connection to server localhost/127.0.0.1:2181. Will not attempt to >> authenticate using SASL (Unable to locate a login configura$ >> ... >> 2012-11-21 17:21:42,316 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 >> for server null, unexpected error, closing socket connection and attempting >> reconnect java.net.ConnectException: Connection refused >> ... (remark: it tried above 3 times, then had FATAL error as follows) >> >> 2012-11-21 17:21:57,846 ERROR >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020 >> Received unexpected KeeperException, re-throwing exception >> ... >> 2012-11-21 17:21:57,847 FATAL >> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server >> ... >> >> >> >> Please help. >> >> Thanks >> >> >> >> >> >> On 22 Nov 2012, at 1:22 AM, Jean-Marc Spaggiari wrote: >> >>> Hi, >>> >>> What do you have on your HBase configuration? Are you passing the name >>> of the Quorum servers? >>> $ cat conf/hbase-site.xml >>> ...... >>> </property> >>> <property> >>> <name>hbase.zookeeper.quorum</name> >>> <value>cube,latitude,node3</value> >>> <description>Comma separated list of servers in the ZooKeeper >>> Quorum. >>> For example, >>> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". >>> By default this is set to localhost for local and pseudo-distributed >>> modes >>> of operation. For a fully-distributed setup, this should be set to a >>> full >>> list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in >>> hbase-env.sh >>> this is the list of servers which we will start/stop ZooKeeper on. >>> </description> >>> </property>
-
Re: A region server stopped (timeout after trying to connect local Zookeeper)Jean-Marc Spaggiari 2012-11-22, 00:39
I think the MAIN difference is the uppercase on the property... Seems
that hbase-site.xml is case sensitive (which seems to be normal in Java and unix world). You might want to retry by putting back the uppercase to see if this was the issue. JM 2012/11/21, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: > Hi > > I changed the order of ZooKeepers in the value of hbase.zookeeper.quorum, > from "m146,m145,m143" to "m143,m145,m146", set timeout from 60000 to 70000, > and commented out lzo property. it works now, here is the diff > > 1) $ diff hbase-site.xml hbase-site.xml.xxx > 41,44c41,43 > < > < <property> > < <name>hbase.zookeeper.quorum</name> > < <value>m143,m145,m146</value> > --- >> <property> >> <name>hbase.ZooKeeper.quorum</name> >> <value>m146,m145,m143</value> > 49c48,55 > < <value>70000</value> > --- >> <value>60000</value> >> </property> >> >> <!-- >> /** >> <property> >> <name>hbase.regionserver.codecs</name> >> <value>lzo,gz</value> > 50a57,58 >> **/ >> --> > > Above is the only change today . > > > 2) hbase log: > 2012-11-22 07:26:19,431 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=m145:2181,m143:2181,m146:2181 > sessionTimeout=70000 watcher=regionserver:6$ > > > I don't know why but it works now. It seems that hbase somehow could not > read in hbase-site.xml correctly. > > > Thanks > > > > > On 22 Nov 2012, at 7:51 AM, Jean-Marc Spaggiari wrote: > >> Can you do JPS on your master and look at the logs too? >> >> Another think, can you try with hbase.zookeeper.quorum instead of >> hbase.ZooKeeper.quorum? >> >> 2012/11/21, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: >>> Hi, >>> >>> Here are my HBase configuration and test: >>> >>> 1) {$HBASE_HOME}hbase/conf/hbase-site.xml >>> <property> >>> <name>hbase.ZooKeeper.quorum</name> >>> <value>m146,m145,m143</value> >>> </property> >>> >>> <property> >>> <name>zookeeper.session.timeout</name> >>> <value>60000</value> >>> </property> >>> >>> >>> 2) {$HBASE_HOME}hbase/conf/hbase-env.sh >>> export HBASE_MANAGES_ZK=false >>> >>> >>> 3) I used " {$ZK_HOME}/bin/zkCli.sh -server m145,m146,m143" to test the >>> connection, it worked >>> [zk: m145,m146,m143(CONNECTED) 0] >>> >>> >>> 4) from the logs, I found that the connectString was odd, the >>> RegionServer >>> did not use the setting of "hbase.ZooKeeper.quorum" in >>> conf/hbase-site.xml, >>> it seemed that it always used the default and tried to connect >>> "localhost:2181" in the distributed cluster: >>> >>> 2012-11-21 17:21:42,299 INFO org.apache.zookeeper.ZooKeeper: Initiating >>> client connection, connectString=localhost:2181 sessionTimeout=60000 >>> watcher=regionserver:60020 >>> ... >>> 2012-11-21 17:21:42,313 INFO org.apache.zookeeper.ClientCnxn: Opening >>> socket connection to server localhost/127.0.0.1:2181. Will not attempt >>> to >>> authenticate using SASL (Unable to locate a login configura$ >>> ... >>> 2012-11-21 17:21:42,316 WARN org.apache.zookeeper.ClientCnxn: Session >>> 0x0 >>> for server null, unexpected error, closing socket connection and >>> attempting >>> reconnect java.net.ConnectException: Connection refused >>> ... (remark: it tried above 3 times, then had FATAL error as follows) >>> >>> 2012-11-21 17:21:57,846 ERROR >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020 >>> Received unexpected KeeperException, re-throwing exception >>> ... >>> 2012-11-21 17:21:57,847 FATAL >>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region >>> server >>> ... >>> >>> >>> >>> Please help. >>> >>> Thanks >>> >>> >>> >>> >>> >>> On 22 Nov 2012, at 1:22 AM, Jean-Marc Spaggiari wrote: >>> >>>> Hi, >>>> >>>> What do you have on your HBase configuration? Are you passing the name >>>> of the Quorum servers? >>>> $ cat conf/hbase-site.xml >>>> ...... >>>> </property> >>>> <property> >>>> <name>hbase.zookeeper.quorum</name> >>>> <value>cube,latitude,node3</value> >>>> <description>Comma separated list of servers in the ZooKeeper >>>> Quorum. >>>> For example,
-
Re: A region server stopped (timeout after trying to connect local Zookeeper)ac@...) 2012-11-22, 01:53
Hi JM,
Thank you! it is case sensitive indeed, a simple change of 'z' brings back ALL RegionServers (and a 'Z' could bring down all too), I spent few hours on other areas and hadn't realized this 'Z' effect. Thanks again. On 22 Nov 2012, at 8:39 AM, Jean-Marc Spaggiari wrote: > I think the MAIN difference is the uppercase on the property... Seems > that hbase-site.xml is case sensitive (which seems to be normal in > Java and unix world). > > You might want to retry by putting back the uppercase to see if this > was the issue. > > JM > > 2012/11/21, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: >> Hi >> >> I changed the order of ZooKeepers in the value of hbase.zookeeper.quorum, >> from "m146,m145,m143" to "m143,m145,m146", set timeout from 60000 to 70000, >> and commented out lzo property. it works now, here is the diff >> >> 1) $ diff hbase-site.xml hbase-site.xml.xxx >> 41,44c41,43 >> < >> < <property> >> < <name>hbase.zookeeper.quorum</name> >> < <value>m143,m145,m146</value> >> --- >>> <property> >>> <name>hbase.ZooKeeper.quorum</name> >>> <value>m146,m145,m143</value> >> 49c48,55 >> < <value>70000</value> >> --- >>> <value>60000</value> >>> </property> >>> >>> <!-- >>> /** >>> <property> >>> <name>hbase.regionserver.codecs</name> >>> <value>lzo,gz</value> >> 50a57,58 >>> **/ >>> --> >> >> Above is the only change today . >> >> >> 2) hbase log: >> 2012-11-22 07:26:19,431 INFO org.apache.zookeeper.ZooKeeper: Initiating >> client connection, connectString=m145:2181,m143:2181,m146:2181 >> sessionTimeout=70000 watcher=regionserver:6$ >> >> >> I don't know why but it works now. It seems that hbase somehow could not >> read in hbase-site.xml correctly. >> >> >> Thanks >> >> >> >> >> On 22 Nov 2012, at 7:51 AM, Jean-Marc Spaggiari wrote: >> >>> Can you do JPS on your master and look at the logs too? >>> >>> Another think, can you try with hbase.zookeeper.quorum instead of >>> hbase.ZooKeeper.quorum? >>> >>> 2012/11/21, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: >>>> Hi, >>>> >>>> Here are my HBase configuration and test: >>>> >>>> 1) {$HBASE_HOME}hbase/conf/hbase-site.xml >>>> <property> >>>> <name>hbase.ZooKeeper.quorum</name> >>>> <value>m146,m145,m143</value> >>>> </property> >>>> >>>> <property> >>>> <name>zookeeper.session.timeout</name> >>>> <value>60000</value> >>>> </property> >>>> >>>> >>>> 2) {$HBASE_HOME}hbase/conf/hbase-env.sh >>>> export HBASE_MANAGES_ZK=false >>>> >>>> >>>> 3) I used " {$ZK_HOME}/bin/zkCli.sh -server m145,m146,m143" to test the >>>> connection, it worked >>>> [zk: m145,m146,m143(CONNECTED) 0] >>>> >>>> >>>> 4) from the logs, I found that the connectString was odd, the >>>> RegionServer >>>> did not use the setting of "hbase.ZooKeeper.quorum" in >>>> conf/hbase-site.xml, >>>> it seemed that it always used the default and tried to connect >>>> "localhost:2181" in the distributed cluster: >>>> >>>> 2012-11-21 17:21:42,299 INFO org.apache.zookeeper.ZooKeeper: Initiating >>>> client connection, connectString=localhost:2181 sessionTimeout=60000 >>>> watcher=regionserver:60020 >>>> ... >>>> 2012-11-21 17:21:42,313 INFO org.apache.zookeeper.ClientCnxn: Opening >>>> socket connection to server localhost/127.0.0.1:2181. Will not attempt >>>> to >>>> authenticate using SASL (Unable to locate a login configura$ >>>> ... >>>> 2012-11-21 17:21:42,316 WARN org.apache.zookeeper.ClientCnxn: Session >>>> 0x0 >>>> for server null, unexpected error, closing socket connection and >>>> attempting >>>> reconnect java.net.ConnectException: Connection refused >>>> ... (remark: it tried above 3 times, then had FATAL error as follows) >>>> >>>> 2012-11-21 17:21:57,846 ERROR >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020 >>>> Received unexpected KeeperException, re-throwing exception >>>> ... >>>> 2012-11-21 17:21:57,847 FATAL >>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region >>>> server >>>> ... >>>> >>>> >>>> >>>> Please help. >>>> >>>> Thanks >>>> >>>> >>>> |