In a cluster with
Isolation response set to Leave powered on when a host becomes
isolated may display this error on a virtual machine.
vSphere HA virtual machine failed to failover
· The virtual machine continues to run without a problem.
This article provides information to:
· Clear the vSphere HA virtual machine failed to failover error from the virtual machine.
· Deal with the vSphere HA virtual machine failed to failover error if occurs.
· Reduce the occurrence of the vSphere HA virtual machine failed to failover error.
This behavior can occur whenever a High Availability master agent declares a host dead. However, the virtual machines continue to run without incident. This alarm does not mean HA has failed or stopped working. When this alarm is triggered, it means that one or more virtual machines failed to get powered on by a host in a cluster protected by HA.
Possible reasons for this to happen:
· The host is still running but has disconnected from the network. The cluster's host isolation response is set to Leave powered on:
· When a host becomes network isolated, the remaining hosts in the cluster do not know if the host has crashed, or is just disconnected from the network. As a result, the remaining hosts attempt to power up the virtual machines that were last logged as running on the isolated host. With Leave powered on, the host that became network isolated will leave the virtual machines up and running and not attempt to power them down, thus keeping the locks on the files. With the isolated host locking the files, the remaining hosts will fail to perform the power on task on the virtual machines resulting in the alarm triggering.
· The host is still running but has disconnected from the network. The cluster's host isolation response is set to Shut down or Power off:
· With this host isolation response, a host will attempt to send shut down or power off commands to its running virtual machines when it recognizes it is isolated. Once a virtual machine is completely shut down, and the original isolated host no longer has locks on the virtual machines files, the remaining hosts in the cluster will be able to obtain the locks necessary to power up the virtual machines. If the virtual machine is not successfully shut down, or the locks are not released, then the alarm will be trigger.
· The host has failed and the virtual machine storage is in a degraded state. The remaining hosts in the cluster cannot contact the storage device and fail to power up the virtual machines, resulting in the alarm triggering.
This is expected behavior in VMware vCenter Server 5.0.x,
5.1.x and 5.5.x. Because the virtual machines continue to run without incident,
you can safely ignore this issue.
To clear the alarm from the virtual machine:
1. Select virtual machine with the triggered alarm.
2. Click on the Alarms tab and then the Triggered Alarms button.
3. Right-click the vSphere HA virtual machine failover failed alarm and click Clear.
Note: If this alarm is on multiple virtual machines, you may select the host, cluster, data center, or vCenter Server object in the left pane and continue with step 2 to clear the alarms with fewer steps.
For more information on dealing with alerts, see:
· vCenter Server 5.0 - the Acknowledge Triggered Alarms section in the vSphere 5.0 Documentation Center.
· vCenter Server 5.1 - the Acknowledge Triggered Alarms section in the vSphere 5.1 Monitoring and Performance Guide.
· vCenter Server 5.5 - the Acknowledge Triggered Alarms in the Sphere Web Client section in the vSphere 5.5 Monitoring and Performance Guide.
To reduce the likelihood of this issue occurring:
· Use multiple management networks. For more information, see vSphere High Availability Deployment Best Practices.
Ensure the datastore
heartbeats within vCenter Server are communicating properly for HA to run
efficiently when management network problems occur.
For example, if using SAN and an IP-based storage, mount a couple of SAN-based datastores to the hosts in the cluster so that HA may use them instead of IP-based storage. Or, if only IP-based storage is used, consider fault isolating one or more of the networks used for storage from those used for the management network.
来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/31015730/viewspace-2148948/，如需转载，请注明出处，否则将追究法律责任。