Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - A region server stopped (timeout after trying to connect local Zookeeper)


Copy link to this message
-
Re: A region server stopped (timeout after trying to connect local Zookeeper)
ac@...) 2012-11-22, 01:53
Hi JM,

Thank you!

it is case sensitive indeed, a simple change of  'z' brings back ALL RegionServers (and a 'Z' could bring down all too), I spent few hours on other areas and hadn't realized this 'Z' effect.

Thanks again.
 

On 22 Nov 2012, at 8:39 AM, Jean-Marc Spaggiari wrote:

> I think the MAIN difference is the uppercase on the property... Seems
> that hbase-site.xml is case sensitive (which seems to be normal in
> Java and unix world).
>
> You might want to retry by putting back the uppercase to see if this
> was the issue.
>
> JM
>
> 2012/11/21, [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
>> Hi
>>
>> I changed the order of ZooKeepers in the value of hbase.zookeeper.quorum,
>> from "m146,m145,m143" to "m143,m145,m146", set timeout from 60000 to 70000,
>> and commented out lzo property.  it works now, here is the diff
>>
>> 1) $ diff hbase-site.xml hbase-site.xml.xxx
>> 41,44c41,43
>> <
>> < <property>
>> < <name>hbase.zookeeper.quorum</name>
>> < <value>m143,m145,m146</value>
>> ---
>>> <property>
>>> <name>hbase.ZooKeeper.quorum</name>
>>> <value>m146,m145,m143</value>
>> 49c48,55
>> < <value>70000</value>
>> ---
>>> <value>60000</value>
>>> </property>
>>>
>>> <!--
>>> /**
>>> <property>
>>> <name>hbase.regionserver.codecs</name>
>>> <value>lzo,gz</value>
>> 50a57,58
>>> **/
>>> -->
>>
>> Above is the only change today .
>>
>>
>> 2) hbase log:
>> 2012-11-22 07:26:19,431 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> client connection, connectString=m145:2181,m143:2181,m146:2181
>> sessionTimeout=70000 watcher=regionserver:6$
>>
>>
>> I don't know why but it works now. It seems that hbase somehow could not
>> read in hbase-site.xml correctly.
>>
>>
>> Thanks
>>
>>
>>
>>
>> On 22 Nov 2012, at 7:51 AM, Jean-Marc Spaggiari wrote:
>>
>>> Can you do JPS on your master and look at the logs too?
>>>
>>> Another think, can you try with hbase.zookeeper.quorum instead of
>>> hbase.ZooKeeper.quorum?
>>>
>>> 2012/11/21, [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
>>>> Hi,
>>>>
>>>> Here are my HBase configuration and test:
>>>>
>>>> 1) {$HBASE_HOME}hbase/conf/hbase-site.xml
>>>> <property>
>>>> <name>hbase.ZooKeeper.quorum</name>
>>>> <value>m146,m145,m143</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>zookeeper.session.timeout</name>
>>>> <value>60000</value>
>>>> </property>
>>>>
>>>>
>>>> 2) {$HBASE_HOME}hbase/conf/hbase-env.sh
>>>> export HBASE_MANAGES_ZK=false
>>>>
>>>>
>>>> 3) I used " {$ZK_HOME}/bin/zkCli.sh -server m145,m146,m143"  to test the
>>>> connection, it worked
>>>> [zk: m145,m146,m143(CONNECTED) 0]
>>>>
>>>>
>>>> 4) from the logs, I found that the connectString was odd, the
>>>> RegionServer
>>>> did not use the setting of "hbase.ZooKeeper.quorum" in
>>>> conf/hbase-site.xml,
>>>> it seemed that it always used the default and tried to connect
>>>> "localhost:2181" in the distributed cluster:
>>>>
>>>> 2012-11-21 17:21:42,299 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>>> client connection, connectString=localhost:2181 sessionTimeout=60000
>>>> watcher=regionserver:60020
>>>> ...
>>>> 2012-11-21 17:21:42,313 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>> socket connection to server localhost/127.0.0.1:2181. Will not attempt
>>>> to
>>>> authenticate using SASL (Unable to locate a login configura$
>>>> ...
>>>> 2012-11-21 17:21:42,316 WARN org.apache.zookeeper.ClientCnxn: Session
>>>> 0x0
>>>> for server null, unexpected error, closing socket connection and
>>>> attempting
>>>> reconnect java.net.ConnectException: Connection refused
>>>> ...  (remark: it tried above 3 times, then had FATAL error as follows)
>>>>
>>>> 2012-11-21 17:21:57,846 ERROR
>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020
>>>> Received unexpected KeeperException, re-throwing exception
>>>> ...
>>>> 2012-11-21 17:21:57,847 FATAL
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>>>> server
>>>> ...
>>>>
>>>>
>>>>
>>>> Please help.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>