Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Tasktrackers slow to subscribe


Copy link to this message
-
Tasktrackers slow to subscribe
Alex Current 2013-02-20, 23:11
Hadoop 1.0.4
Java JDK 6u37
CentOS 6.3

I am having a strange issue where the TTs are slow to rejoin the cluster
after a restart.

I issued a stop-all / start-all on the cluster.  The DNs came up
immediately.  All of the DNs reported in the NN UI as alive within 5/10
seconds of restart.  Once the NN is out of Safe Mode (30 seconds), the TTs
are slow, some of them take up to 20 mins, to rejoin the cluster.  They
don't show up in the UI or on the CLI (hadoop job -list-active-trackers).

I have attempted...

Stopping / starting the cluster with the stop-all / start-sll scripts AND
with the stop-mapred/stop-hdfs / start-hdfs/start-mapred scripts.
Stopping and starting the DN / TT on the nodes directly.
Attempted to run jobs while waiting for the TTs to subscribe.

Nothing seems to "kick" them into subscribing.

Here is the log from a DN/TT node, notice the time stamps.  Thanks in
advance for any help.

************************************************************
STARTUP_MSG: Starting TaskTracker
STARTUP_MSG:   host STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.4
STARTUP_MSG:   build https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
************************************************************/
2013-02-20 00:00:15,552 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
loaded properties from hadoop-metrics2.properties
2013-02-20 00:00:15,612 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
MetricsSystem,sub=Stats registered.
2013-02-20 00:00:15,613 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
period at 10 second(s).
2013-02-20 00:00:15,613 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics
system started
2013-02-20 00:00:15,880 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
registered.
2013-02-20 00:00:16,087 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2013-02-20 00:00:16,146 INFO org.apache.hadoop.http.HttpServer: Added
global filtersafety
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2013-02-20 00:00:16,171 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-02-20 00:00:16,174 INFO org.apache.hadoop.mapred.TaskTracker: Starting
tasktracker with owner as hadoop
2013-02-20 00:00:16,175 INFO org.apache.hadoop.mapred.TaskTracker: Good
mapred local directories are:
/1/mapred/local,/2/mapred/local,/3/mapred/local
2013-02-20 00:21:00,609 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2013-02-20 00:21:00,901 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm
registered.
2013-02-20 00:21:00,902 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
TaskTrackerMetrics registered.
2013-02-20 00:21:00,971 INFO org.apache.hadoop.ipc.Server: Starting
SocketReader
2013-02-20 00:21:00,973 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
RpcDetailedActivityForPort64357 registered.
2013-02-20 00:21:00,973 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
RpcActivityForPort64357 registered.
2013-02-20 00:21:00,977 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2013-02-20 00:21:00,977 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 64357: starting
2013-02-20 00:21:00,978 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 64357: starting
2013-02-20 00:21:00,978 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 64357: starting
2013-02-20 00:21:00,978 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 64357: starting
2013-02-20 00:21:00,978 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 64357: starting
2013-02-20 00:21:00,979 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 64357: starting
2013-02-20 00:21:00,979 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 64357: starting
2013-02-20 00:21:00,979 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 64357: starting