Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re:: Region Servers crashing following: "File does not exist", "Too many open files" exceptions


Copy link to this message
-
Re:: Region Servers crashing following: "File does not exist", "Too many open files" exceptions
Dhaval Shah 2013-02-10, 01:24

It seems like you need to increase the limit on the number of xceivers on the hdfs config looking at your error messages.
------------------------------
On Sun 10 Feb, 2013 6:37 AM IST David Koch wrote:

>Hello,
>
>As of lately, we have been having issues with Region Servers crashing in
>our cluster. This happens while running Map/Reduce jobs over HBase tables
>in particular but also spontaneously when the cluster is seemingly idle.
>
>Restarting the Region Servers or even HBase entirely as well as HDFS and
>Map/Reduce services does not fix the problem and jobs will fail during the
>next attempt citing "Region not served" exceptions. It is not always the
>same nodes that crash.
>
>The log data during the minutes leading up to the crash contain many "File
>does not exist /hbase/<table_name>/..." error messages which change to "Too
>many open files" messages, finally, there are a few "Failed to renew lease
>for DFSClient" messages followed by several "FATAL" messages about HLog not
>being able to synch and immediately afterwards a terminal "ABORTING region
>server".
>
>You can find an extract of a Region Server log here:
>http://pastebin.com/G39LQyQT.
>
>Running "hbase hbck" reveals inconsistencies in some tables, but attempting
>a repair with "hbase hbck -repair" stalls due to some regions being in
>transition, see here: http://pastebin.com/JAbcQ4cc.
>
>The setup contains 30 machines, 26GB RAM each, the services are managed
>using CDH4, so HBase version is 0.92.x. We did not tweak any of the default
>configuration settings, however table scans are done with sensible
>scan/batch/filter settings.
>
>Data intake is about 100GB/day which are added at a time when no Map/Reduce
>jobs are running. Tables have between 100 * 10^6 and 2 * 10^9 rows, with an
>average of 10 KVs, about 1kb each. Very few rows exceed 10^6 KV.
>
>What can we do to fix these issues? Are they symptomic of a mal-configured
>setup or some critical threshold level being reached? The cluster used to
>be stable.
>
>Thank you,
>
>/David
+
David Koch 2013-02-10, 02:17
+
Marcos Ortiz 2013-02-10, 03:22
+
David Koch 2013-02-10, 12:51
+
shashwat shriparv 2013-02-10, 14:53
+
David Koch 2013-02-10, 20:11
+
ramkrishna vasudevan 2013-02-11, 03:58
+
David Koch 2013-02-11, 15:24
+
ramkrishna vasudevan 2013-02-11, 16:50
+
David Koch 2013-02-11, 22:14
+
David Koch 2013-02-10, 01:07