Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - A region server stopped (timeout after trying to connect local Zookeeper)


Copy link to this message
-
Re: A region server stopped (timeout after trying to connect local Zookeeper)
Jean-Marc Spaggiari 2012-11-21, 17:22
Hi,

What do you have on your HBase configuration? Are you passing the name
of the Quorum servers?
$ cat conf/hbase-site.xml
......
  </property>
    <property>
      <name>hbase.zookeeper.quorum</name>
      <value>cube,latitude,node3</value>
      <description>Comma separated list of servers in the ZooKeeper Quorum.
      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
      By default this is set to localhost for local and pseudo-distributed modes
      of operation. For a fully-distributed setup, this should be set to a full
      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
hbase-env.sh
      this is the list of servers which we will start/stop ZooKeeper on.
      </description>
    </property>
.....

2012/11/21, [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
> Hi,
>
>
> I have the following line in /etc/hosts in all servers, should I keep it or
> comment it out or ...?
>
> 127.0.0.1       localhost
>
> Please help.
>
> Thanks
>
>
>
> On 21 Nov 2012, at 7:16 PM, [EMAIL PROTECTED] wrote:
>
>> Hi,
>>
>>
>> Please help!!
>>
>> HBase version: 0.94
>> ZooKeeper: 3.4.4
>>
>> One of the regional servers stopped very quickly after HBASE is started:
>>
>> ### Check JPS after HBASE cluster was started, could find the
>> HRegionServer process (*** there is no any ZooKeeper instance running in
>> this server ***)
>> $ jps
>> 24767 Jps
>> 18418 TaskTracker
>> 24678 HRegionServer
>> 18156 DataNode
>>
>> ### Wait a while and checked JPS again,  HRegionServer process gone
>> $ jps
>> 18418 TaskTracker
>> 24784 Jps
>> 18156 DataNode
>>
>>
>> ### Here is the setting in hbase-site.xml ( enabled
>> hbase.cluster.distributed, set up 3 ZooKeepers, timeout= 60000)
>> <property>
>> <name>hbase.cluster.distributed</name>
>> <value>true</value>
>> </property>
>>
>> <property>
>> <name>hbase.ZooKeeper.quorum</name>
>> <value>m146,m145,m143</value>
>> </property>
>>
>> <property>
>> <name>zookeeper.session.timeout</name>
>> <value>60000</value>
>> </property>
>>
>>
>> ### hbase-env.sh also tells HBASE not to manage local instance of
>> ZooKeeper
>> export HBASE_MANAGES_ZK=false
>>
>>
>> ###This server can connect to the 3 ZooKeepers,
>> ./zkCli.sh -server m145,m146,m143   ==>  [zk: m145,m146,m143(CONNECTED)
>> 0]
>>
>>
>> ### checked the hbase log file, found something odd,  seemed that it tried
>> to connect local ZooKeeper
>> 2012-11-21 17:30:33,066 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> client connection, connectString=localhost:2181 sessionTimeout=60000
>> watcher=regionserver:60020
>>
>> 2012-11-21 17:31:33,254 WARN
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
>> ZooKeeper exception:
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss for /hbase/master
>>
>> 2012-11-21 17:31:33,254 INFO org.apache.hadoop.hbase.util.RetryCounter:
>> Sleeping 2000ms before retry #1...
>> 2012-11-21 17:32:33,262 INFO org.apache.zookeeper.ClientCnxn: Client
>> session timed out, have not heard from server in 60010ms for sessionid
>> 0x0, closing socket connection and attempting reconnect
>>
>> 2012-11-21 17:32:33,362 WARN
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
>> ZooKeeper exception:
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss for /hbase/master
>>
>> ......
>>
>> 2012-11-21 17:34:33,570 ERROR
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists
>> failed after 3 retries
>> 2012-11-21 17:34:33,571 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil:
>> regionserver:60020 Unable to set watcher on znode /hbase/master
>> 2012-11-21 17:34:33,573 ERROR
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020
>> Received unexpected KeeperException, re-throwing exception
>> 2012-11-21 17:34:33,573 FATAL
>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
>> ......
>> 2012-11-21 17:34:33,576 FATAL
>> org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: