Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> hbase master retries to RS/DN


Copy link to this message
-
Re: hbase master retries to RS/DN
Thanks, now with setting that value to "2", we still get slow DN death
master recovery of logs:

2011-05-19 23:34:55,109 WARN org.apache.hadoop.hdfs.DFSClient: Failed
recovery attempt #3 from primary datanode 10.103.7.21:50010
java.net.ConnectException: Call to /10.103.7.21:50020 failed on
connection exception: java.net.ConnectException: Connection refused
It keeps trying to contact datanode that is not alive, doesn't it
suppose to make DN as dead-do-not-try-again?

-Jack

On Thu, May 19, 2011 at 2:22 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
> The config and the retries you pasted are unrelated.
>
> The former controls the number of retries when regions are moving and
> the client must query .META. or -ROOT-
>
> The latter is the Hadoop RPC client timeout and looking at the code
> the config is ipc.client.connect.max.retries from
> https://github.com/apache/hadoop/blob/branch-0.20/src/core/org/apache/hadoop/ipc/Client.java#L631
>
> J-D
>
> On Thu, May 19, 2011 at 11:46 AM, Jack Levin <[EMAIL PROTECTED]> wrote:
>> Hello, we have a situation when when RS/DN crashes hard, master is
>> very slow to recover, we notice that it waits on these log lines:
>> 2011-05-19 11:20:57,766 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: /10.103.7.22:50020. Already tried 0 time(s).
>> 2011-05-19 11:20:58,767 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: /10.103.7.22:50020. Already tried 1 time(s).
>> 2011-05-19 11:20:59,768 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: /10.103.7.22:50020. Already tried 2 time(s).
>> 2011-05-19 11:21:00,768 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: /10.103.7.22:50020. Already tried 3 time(s).
>> 2011-05-19 11:21:01,769 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: /10.103.7.22:50020. Already tried 4 time(s).
>> 2011-05-19 11:21:02,769 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: /10.103.7.22:50020. Already tried 5 time(s).
>> 2011-05-19 11:21:03,770 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: /10.103.7.22:50020. Already tried 6 time(s).
>> 2011-05-19 11:21:04,771 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: /10.103.7.22:50020. Already tried 7 time(s).
>> 2011-05-19 11:21:05,771 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: /10.103.7.22:50020. Already tried 8 time(s).
>> 2011-05-19 11:21:06,772 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: /10.103.7.22:50020. Already tried 9 time(s).
>>
>> This set repeats multiple times for log splits.   So I look around,
>> and set this config to be:
>>
>>  <property>
>>    <name>hbase.client.retries.number</name>
>>    <value>2</value>
>>    <description>Maximum retries.  Used as maximum for all retryable
>>    operations such as fetching of the root region from root region
>>    server, getting a cell's value, starting a row update, etc.
>>    Default: 10.
>>    </description>
>>  </property>
>>
>> Unfortunately, next time server died, it made no difference.  Is this
>> a known issue for 0.89?  If so, was it resolved in 0.90.2?
>>
>> -Jack
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB