|
|
-
Tasktrackers slow to subscribeAlex Current 2013-02-20, 23:11
Hadoop 1.0.4
Java JDK 6u37 CentOS 6.3 I am having a strange issue where the TTs are slow to rejoin the cluster after a restart. I issued a stop-all / start-all on the cluster. The DNs came up immediately. All of the DNs reported in the NN UI as alive within 5/10 seconds of restart. Once the NN is out of Safe Mode (30 seconds), the TTs are slow, some of them take up to 20 mins, to rejoin the cluster. They don't show up in the UI or on the CLI (hadoop job -list-active-trackers). I have attempted... Stopping / starting the cluster with the stop-all / start-sll scripts AND with the stop-mapred/stop-hdfs / start-hdfs/start-mapred scripts. Stopping and starting the DN / TT on the nodes directly. Attempted to run jobs while waiting for the TTs to subscribe. Nothing seems to "kick" them into subscribing. Here is the log from a DN/TT node, notice the time stamps. Thanks in advance for any help. ************************************************************ STARTUP_MSG: Starting TaskTracker STARTUP_MSG: host STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 ************************************************************/ 2013-02-20 00:00:15,552 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2013-02-20 00:00:15,612 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2013-02-20 00:00:15,613 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2013-02-20 00:00:15,613 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics system started 2013-02-20 00:00:15,880 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered. 2013-02-20 00:00:16,087 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2013-02-20 00:00:16,146 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 2013-02-20 00:00:16,171 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2013-02-20 00:00:16,174 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as hadoop 2013-02-20 00:00:16,175 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /1/mapred/local,/2/mapred/local,/3/mapred/local 2013-02-20 00:21:00,609 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2013-02-20 00:21:00,901 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered. 2013-02-20 00:21:00,902 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered. 2013-02-20 00:21:00,971 INFO org.apache.hadoop.ipc.Server: Starting SocketReader 2013-02-20 00:21:00,973 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort64357 registered. 2013-02-20 00:21:00,973 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort64357 registered. 2013-02-20 00:21:00,977 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2013-02-20 00:21:00,977 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 64357: starting 2013-02-20 00:21:00,978 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 64357: starting 2013-02-20 00:21:00,978 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 64357: starting 2013-02-20 00:21:00,978 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 64357: starting 2013-02-20 00:21:00,978 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 64357: starting 2013-02-20 00:21:00,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 64357: starting 2013-02-20 00:21:00,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 64357: starting 2013-02-20 00:21:00,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 64357: starting |