Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBaseAdmin#checkHBaseAvailable COST ABOUT 1 MINUTE TO CHECK A DEAD(OR NOT EXISTS) HBASE MASTER


Copy link to this message
-
Re: Re: HBaseAdmin#checkHBaseAvailable COST ABOUT 1 MINUTE TO CHECK A DEAD(OR NOT EXISTS) HBASE MASTER
Esteban Gutierrez 2013-11-14, 06:33
jingych,

inline:

On Wed, Nov 13, 2013 at 7:06 PM, jingych <[EMAIL PROTECTED]> wrote:

>  Thanks, Esteban and Stack!
>
> As Esteban said, the problem was solved.
>
> My config is below:
> <code>
>  conf.setInt("hbase.client.retries.number", 1);
> conf.setInt("zookeeper.session.timeout", 5000);
> conf.setInt("zookeeper.recovery.retry", 1);
> conf.setInt("zookeeper.recovery.retry.intervalmill", 50);
> </code>
> But it still cost 46 seconds.
> And the log printing:
> <log>
>
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
>
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server
>
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server
>
> </log>
> It still tried to build the 4 above connections.
>

The client (via HConnectionManager) needs  to set 3 watchers on each of
those znodes in ZK, each attempt will have a max timeout of 5 seconds (you
have a single zk server) plus 10 seconds of the second attempt: 3 * (5 *
2^0) + 3 * (5 * 2^1) = 45 and the extra second should come from a hardcoded
sleep in the RPC implementation during a retry.
Setting zookeeper.recovery.retry=0 can make it fail faster but in case of a
transient failure then you will have to handle the reconnection in your
code.

>
> Could you please explain why the ZK do this? ( I'm realy new to the HBase
> world.)
> If i set the ZK session timeout with 1s, is't OK?
>

you *could* but you don't want clients to overwhelm ZK by re-establishing
connections over and over.
> And what do you mean about "depending on the number of ZK servers you have
> running the socket level timeout in the client to a ZK server will be
> zookeeper.session.timeout/#ZKs"?
> It means that if i hava 3 zookeepers and zookeeper.session.timeout=5000,
> each connection will 5000/3 timeout?
>

thats correct, the timeout to establish a connection to  ZK will be around
1.6 seconds (5000 milliseconds / 3) with 3 ZKs.
> I'm running ZK and HBase Master at one node as pseudo-distributed mode.
>

> Best Regards!
>
> ------------------------------
>
> jingych
>
> 2013-11-14
>
>  *发件人:* Esteban Gutierrez <[EMAIL PROTECTED]>
> *发送时间:* 2013-11-14 06:10
> *收件人:* Stack <[EMAIL PROTECTED]>
> *抄送:* Hbase-User <[EMAIL PROTECTED]>; jingych <[EMAIL PROTECTED]>
> *主题:* Re: Re: HBaseAdmin#checkHBaseAvailable COST ABOUT 1 MINUTE TO CHECK
> A DEAD(OR NOT EXISTS) HBASE MASTER
>
> jingych,
>
> That timeout comes from ZooKeeper, are you running ZK on the same node you
> are running the HBase Master? If your environment requires to fail fast
> even for ZK connection timeouts then you need to reduce
> zookeeper.recovery.retry.intervalmill and zookeeper.recovery.retry since
> the retries are done via an exponential backoff (1 second, 2 seconds, 8
> seconds), also depending on the number of ZK servers you have running the
> socket level timeout in the client to a ZK server will be
> zookeeper.session.timeout/#ZKs
>
> cheers,
> esteban.
>
>
>
>
>
>
>  --
> Cloudera, Inc.
>
>
>
> On Wed, Nov 13, 2013 at 7:21 AM, Stack <[EMAIL PROTECTED]> wrote:
>
>> More of the log and the version of HBase involved please.  Thanks.
>> St.Ack
>>
>>
>>  On Wed, Nov 13, 2013 at 1:07 AM, jingych <[EMAIL PROTECTED]> wrote:
>>
>>> Thanks, esteban!
>>>
>>> I'v tried. But it did not work.
>>>
>>> I first load the customer hbase-site.xml, and then try to check the
>>> hbase server.
>>> So my code is like this:
>>> <code>
>>> conf.setInt("hbase.client.retries.number", 1);
>>> conf.setInt("hbase.client.pause", 5);
>>> conf.setInt("ipc.socket.timeout", 5000);
>>> conf.setInt("hbase.rpc.timeout", 5000);
>>> </code>
>>>
>>> The log printing: Sleeping 4000ms before retry #2...
>>>
>>> If the zookeeper's quarum is the wrong address, the process will take