Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Needs Help: Hang during DataNode shutdown


Copy link to this message
-
Needs Help: Hang during DataNode shutdown

Problem Abstract: TestCLI hangs
Hadoop source code version: 1.0.3
Test Environment: RHEL 6.1 + IBM JAVA 7 JVM
Development Environment RHEL 6.1 + IBM JAVA 7 SDK

cause of the hang = At the end of TestCLI and during tearDown() cleanup,
hdfs shutdown is called. As the result of hdfs cluster shutdown, all data
nodes are also subjected to shutdown.
However during DataNode shutdown process, thread hangs on a call to
ServerSocket.close() in DataXceiverServer>kill() method, thus
AsynchronousCloseException never get generated that in turn doesn't
terminate DataXceiverServer.run() thread, HANG.
I have pasted a portion of testcase log file pointing at the location of
the hang.

Has anyone experienced a similar problem? Any suggestion on how to debug a
serversocket hang ?

LOG FILE:
====================================Shutting down the Mini HDFS Cluster
Shutting down DataNode 0
2012-10-08 09:58:11,347 INFO  mortbay.log (Slf4jLog.java:info(67)) -
Stopped SelectChannelConnector@localhost:0
2012-10-08 09:58:11,349 INFO  ipc.Server (Server.java:stop(1589)) -
Stopping server on 56425
2012-10-08 09:58:11,350 INFO  ipc.Server (Server.java:run(1428)) - IPC
Server handler 0 on 56425: exiting
2012-10-08 09:58:11,350 INFO  ipc.Server (Server.java:run(1428)) - IPC
Server handler 1 on 56425: exiting
2012-10-08 09:58:11,350 INFO  ipc.Server (Server.java:run(1428)) - IPC
Server handler 2 on 56425: exiting
2012-10-08 09:58:11,350 INFO  ipc.Server (Server.java:run(466)) - Stopping
IPC Server listener on 56425
2012-10-08 09:58:11,350 INFO  metrics.RpcInstrumentation
(RpcInstrumentation.java:shutdown(154)) - shut down

   before call to close socketserver /127.0.0.1:40462   bond to true
<==================================================== print before call to
ss.close() in DataXceiverServer>kill()
   <<<< HANG on ServerSocket.close() .. as the result
AsynchronousCloseException never get generated ==> run() thread doesn't get
terminated  .. >>>>>>>

2012-10-08 09:58:11,351 INFO  ipc.Server (Server.java:run(663)) - Stopping
IPC Server Responder
2012-10-08 09:58:11,427 INFO  datanode.DataNode (DataNode.java:run(1470)) -
DatanodeRegistration(127.0.0.1:40462,
storageID=DS-1568625081-127.0.0.1-40462-1349708246395, infoPort=48177,
ipcPort=56425):Finishing DataNode in: FSDataset
{dirpath='/home/sanjar/development/hadoop-common-1.0.3/build/test/data/dfs/data/data1/current,/home/sanjar/development/hadoop-common-1.0.3/build/test/data/dfs/data/data2/current'}
2012-10-08 09:58:11,427 WARN  util.MBeans (MBeans.java:unregister(73)) -
Hadoop:service=DataNode,name=DataNodeInfo
javax.management.InstanceNotFoundException:
Hadoop:service=DataNode,name=DataNodeInfo
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean
(DefaultMBeanServerInterceptor.java:1107)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean
(DefaultMBeanServerInterceptor.java:439)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean
(DefaultMBeanServerInterceptor.java:427)
at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean
(JmxMBeanServer.java:548)
at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataNode.unRegisterMXBean
(DataNode.java:522)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown
(DataNode.java:737)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run
(DataNode.java:1471)
at java.lang.Thread.run(Thread.java:779)
2012-10-08 09:58:11,428 INFO  ipc.Server (Server.java:stop(1589)) -
Stopping server on 56425
2012-10-08 09:58:11,428 INFO  metrics.RpcInstrumentation
(RpcInstrumentation.java:shutdown(154)) - shut down
before call to close socketserver /127.0.0.1:40462   bond to true
2012-10-08 09:58:11,450 INFO  datanode.DataBlockScanner
(DataBlockScanner.java:run(630)) - Exiting DataBlockScanner thread.
2012-10-08 09:58:13,271 INFO  namenode.FSNamesystem
(QueueProcessingStatistics.java:logEndFirstCycle(221)) - ReplicateQueue
QueueProcessingStatistics: First cycle completed 0 blocks in 0 msec
2012-10-08 09:58:13,273 INFO  namenode.FSNamesystem
(QueueProcessingStatistics.java:logEndLastCycle(234)) - ReplicateQueue
QueueProcessingStatistics: Queue flush completed 0 blocks in 0 msec
processing time, 0 msec clock time, 1 cycles
2012-10-08 09:58:13,279 INFO  hdfs.StateChange
(FSNamesystem.java:invalidateWorkForOneNode(3199)) - BLOCK* ask
127.0.0.1:40462 to delete  blk_3105753263246905134_1138
blk_7178210216376038821_1035 blk_802519068887323225_1133
blk_-8493096021534159067_1149 blk_7602176121734813288_1171
blk_8668404524952190956_1136 blk_-6042918812810401975_1165
blk_5676309528403925090_1040 blk_1765642037028361222_1129
blk_8092034956650457458_1050 blk_-6648681910233280482_1134
blk_-1683744535295718305_1151 blk_3218539468768911033_1156
blk_5577719696046424228_1158 blk_-6520923586836362454_1172
blk_-6113270872737030099_1041 blk_5341998048760012359_1135
blk_-6969931240772459560_1121 blk_8394808289263836423_1045
blk_8539484182694882993_1146 blk_7671681651891508064_1126
blk_-2776317340570262361_1177 blk_2447083816854452599_1043
blk_-7481685483228902398_1196 blk_-6815661701610337288_1191
blk_3688560883394255137_1176 blk_1263533575967051168_1161
blk_-2692509525469741843_1130 blk_-2920344524512388710_1038
blk_2504931362409054454_1053 blk_-6706249849783565708_1132
blk_5395595878010222434_1194 blk_-4759132823454851626_1139
blk_-7607754639974630346_1140 blk_4506336083156819081_1141
blk_-2008642227253224114_1052 blk_3078592410791570491_1037
blk_9153438704688011593_1153 blk_-7131254408759933522_1122
blk_-2893067452191269835_1175 blk_939939854824782723_1174
blk_-8652591700089908126_1047 blk_-7991126287748428882_1162
blk_-6534381843790270462_1051 blk_-2480347688448975821_1167
blk_1836999865251288060_1192 blk_4837836637390714385_1124
blk_6419085183537620522_1178 blk_-1635920550266181839_1159
blk_1578539823433115335_1157 blk_1662403638206376616_1166
blk_-2376949396131258885_1046 blk_-7492133789334675866_1128
blk_-279683641357271853_11
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB