Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> hbase master retries to RS/DN


Copy link to this message
-
hbase master retries to RS/DN
Hello, we have a situation when when RS/DN crashes hard, master is
very slow to recover, we notice that it waits on these log lines:
2011-05-19 11:20:57,766 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: /10.103.7.22:50020. Already tried 0 time(s).
2011-05-19 11:20:58,767 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: /10.103.7.22:50020. Already tried 1 time(s).
2011-05-19 11:20:59,768 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: /10.103.7.22:50020. Already tried 2 time(s).
2011-05-19 11:21:00,768 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: /10.103.7.22:50020. Already tried 3 time(s).
2011-05-19 11:21:01,769 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: /10.103.7.22:50020. Already tried 4 time(s).
2011-05-19 11:21:02,769 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: /10.103.7.22:50020. Already tried 5 time(s).
2011-05-19 11:21:03,770 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: /10.103.7.22:50020. Already tried 6 time(s).
2011-05-19 11:21:04,771 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: /10.103.7.22:50020. Already tried 7 time(s).
2011-05-19 11:21:05,771 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: /10.103.7.22:50020. Already tried 8 time(s).
2011-05-19 11:21:06,772 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: /10.103.7.22:50020. Already tried 9 time(s).

This set repeats multiple times for log splits.   So I look around,
and set this config to be:

 <property>
    <name>hbase.client.retries.number</name>
    <value>2</value>
    <description>Maximum retries.  Used as maximum for all retryable
    operations such as fetching of the root region from root region
    server, getting a cell's value, starting a row update, etc.
    Default: 10.
    </description>
  </property>

Unfortunately, next time server died, it made no difference.  Is this
a known issue for 0.89?  If so, was it resolved in 0.90.2?

-Jack
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB