Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> A region server stopped (timeout after trying to connect local Zookeeper)


Copy link to this message
-
Re: A region server stopped (timeout after trying to connect local Zookeeper)
Hi,

Here are my HBase configuration and test:

1) {$HBASE_HOME}hbase/conf/hbase-site.xml
<property>
<name>hbase.ZooKeeper.quorum</name>
<value>m146,m145,m143</value>
</property>

<property>
<name>zookeeper.session.timeout</name>
<value>60000</value>
</property>
2) {$HBASE_HOME}hbase/conf/hbase-env.sh
export HBASE_MANAGES_ZK=false
3) I used " {$ZK_HOME}/bin/zkCli.sh -server m145,m146,m143"  to test the connection, it worked
[zk: m145,m146,m143(CONNECTED) 0]
4) from the logs, I found that the connectString was odd, the RegionServer did not use the setting of "hbase.ZooKeeper.quorum" in conf/hbase-site.xml, it seemed that it always used the default and tried to connect "localhost:2181" in the distributed cluster:

2012-11-21 17:21:42,299 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=regionserver:60020
...
2012-11-21 17:21:42,313 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configura$
...
2012-11-21 17:21:42,316 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused
...  (remark: it tried above 3 times, then had FATAL error as follows)
      
2012-11-21 17:21:57,846 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020 Received unexpected KeeperException, re-throwing exception
...
2012-11-21 17:21:57,847 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server ...

Please help.
 
Thanks

On 22 Nov 2012, at 1:22 AM, Jean-Marc Spaggiari wrote:

> Hi,
>
> What do you have on your HBase configuration? Are you passing the name
> of the Quorum servers?
> $ cat conf/hbase-site.xml
> ......
>  </property>
>    <property>
>      <name>hbase.zookeeper.quorum</name>
>      <value>cube,latitude,node3</value>
>      <description>Comma separated list of servers in the ZooKeeper Quorum.
>      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
>      By default this is set to localhost for local and pseudo-distributed modes
>      of operation. For a fully-distributed setup, this should be set to a full
>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
> hbase-env.sh
>      this is the list of servers which we will start/stop ZooKeeper on.
>      </description>
>    </property>
> .....
>
> 2012/11/21, [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
>> Hi,
>>
>>
>> I have the following line in /etc/hosts in all servers, should I keep it or
>> comment it out or ...?
>>
>> 127.0.0.1       localhost
>>
>> Please help.
>>
>> Thanks
>>
>>
>>
>> On 21 Nov 2012, at 7:16 PM, [EMAIL PROTECTED] wrote:
>>
>>> Hi,
>>>
>>>
>>> Please help!!
>>>
>>> HBase version: 0.94
>>> ZooKeeper: 3.4.4
>>>
>>> One of the regional servers stopped very quickly after HBASE is started:
>>>
>>> ### Check JPS after HBASE cluster was started, could find the
>>> HRegionServer process (*** there is no any ZooKeeper instance running in
>>> this server ***)
>>> $ jps
>>> 24767 Jps
>>> 18418 TaskTracker
>>> 24678 HRegionServer
>>> 18156 DataNode
>>>
>>> ### Wait a while and checked JPS again,  HRegionServer process gone
>>> $ jps
>>> 18418 TaskTracker
>>> 24784 Jps
>>> 18156 DataNode
>>>
>>>
>>> ### Here is the setting in hbase-site.xml ( enabled
>>> hbase.cluster.distributed, set up 3 ZooKeepers, timeout= 60000)
>>> <property>
>>> <name>hbase.cluster.distributed</name>
>>> <value>true</value>
>>> </property>
>>>
>>> <property>
>>> <name>hbase.ZooKeeper.quorum</name>
>>> <value>m146,m145,m143</value>
>>> </property>
>>>
>>> <property>
>>> <name>zookeeper.session.timeout</name>
>>> <value>60000</value>
>>> </property>
>>>
>>>
>>> ### hbase-env.sh also tells HBASE not to manage local instance of
>>> ZooKeeper
>>> export HBASE_MANAGES_ZK=false
>>>
>>>
>>> ###This server can connect to the 3 ZooKeepers,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB