Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Can not auto-failover when unplug network interface


Copy link to this message
-
Can not auto-failover when unplug network interface
Hi
   Another auto-failover testing problem:

   My HA can auto-failover after I kill the active NN.When it comes to the
unplug  network interface to simulate the hardware fail,the auto-failover
seems  not to work after   wait for times -the zkfc logs as [1].

   I'm using the default sshfence.
[1] zkfc
logs----------------------------------------------------------------------------------------
2013-12-03 10:05:56,650 INFO org.apache.hadoop.ha.NodeFencer: =====Beginning Service Fencing Process... =====2013-12-03 10:05:56,650 INFO org.apache.hadoop.ha.NodeFencer: Trying method
1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2013-12-03 10:05:56,651 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
Connecting to hadoop3...
2013-12-03 10:05:56,651 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
Connecting to hadoop3 port 22
2013-12-03 10:05:59,648 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable
to connect to hadoop3 as user hadoop
com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to
host
    at com.jcraft.jsch.Util.createSocket(Util.java:386)
    at com.jcraft.jsch.Session.connect(Session.java:182)
    at
org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
    at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
    at
org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
    at
org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
    at
org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
    at
org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
    at
org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
    at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
    at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
    at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2013-12-03 10:05:59,649 WARN org.apache.hadoop.ha.NodeFencer: Fencing
method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2013-12-03 10:05:59,649 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
fence service by any configured method.
2013-12-03 10:05:59,650 WARN org.apache.hadoop.ha.ActiveStandbyElector:
Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
10.7.23.124:8020
    at
org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
    at
org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
    at
org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
    at
org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
    at
org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
    at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
    at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
    at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2013-12-03 10:05:59,650 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Trying to re-establish ZK session
2013-12-03 10:05:59,676 INFO org.apache.zookeeper.ZooKeeper: Session:
0x142931031810260 closed
2013-12-03 10:06:00,678 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
sessionTimeout=5000
watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5ce2acea
2013-12-03 10:06:00,681 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hadoop1/10.7.23.122:2181. Will not attempt to
authenticate using SASL (Unable to locate a login configuration)
2013-12-03 10:06:00,681 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hadoop1/10.7.23.122:2181, initiating session
2013-12-03 10:06:00,709 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hadoop1/10.7.23.122:2181, sessionid 0x142931031810261, negotiated timeout = 5000
2013-12-03 10:06:00,711 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down