Troubleshooting High Availability Clusters in Virtual Environments¶
Hypervisor users (Especially VMware ESX/ESXi)¶
The below settings are specifically for VMware ESX/ESXi but similar settings may be present on Hyper-V, VirtualBox, and other similar hypervisors.
Enable promiscuous mode on the vSwitch
Enable MAC Address changes
Enable Forged transmits
If multiple physical ports exist on the same vswitch, the Net.ReversePathFwdCheckPromisc option must be enabled to work around a vswitch bug where multicast traffic will loop back to the host, causing CARP to not function with “link states coalesced” messages. (See below)
ESX VDS Promisc Workaround¶
If a Virtual Distributed Switch is in use, a port group can be made for the firewall interfaces with promiscuous mode enabled, and a separate non-promiscuous port group may be used for other hosts. This has been reported to work by users on the forum as a way to strike a balance between the requirements for letting CARP function and for securing client ports.
ESX VDS Upgrade Issue¶
If a VDS (Virtual Distributed Switches) is used in ESX 4.0 or 4.1 and an upgrade from 4.0 to 4.1 or 5.0 is performed, the VDS will not properly pass CARP traffic. If a new VDS is created on 4.1 or 5.0, it will work, but the upgraded VDS will not.
It is reported that disabling promiscuous mode on the VDS and then re-enabling it will resolve the issue.
ESX VDS Port Mirroring Issue¶
If port mirroring is enabled on a VDS, it will break promiscuous mode. To fix it, disable promiscuous mode, then re-enable promiscuous mode.
Client Port Issues¶
If a physical CARP cluster is connected to a switch with an ESX box using multiple ports on the ESX box (lagg group or similar), and only certain devices/IPs are reachable by the target VM, then the port group settings in ESX may need adjusted to set the load balancing for the group to hash based on IP, not the originating interface.
Side effects of having that set incorrectly include:
Traffic only reaching the target VM in promisc mode on its NIC
Inability to reach the CARP IP from the target VM when the “real” IP of the primary firewall is reachable
Port forwards or other inbound connections to the target VM work from some IPs and not others.
Login VMware vSphere Client
For each VMware host
Click on host to configure and select Configuration Tab
Click Software Advanced Settings in left pane
Click on Net and scroll down to Net.ReversePathFwdCheckPromisc and set to 1
Promiscuous Mode interfaces need to be set now or twiddled off and then back on. This is done per host by clicking Networking in the Hardware section
For each vSwitch and/or Virtual Machine Port Group.
NOTE: If Promiscuous is already enabled it must be disabled, saved and then re-enabled, saved.
Click on Properties of vSwtich
By Default Promiscuous Mode is Reject.
To Change click Edit > Security Tab
Select Accept from drop down
However, this setting is usually applied per Virtual Machine Port Group (More Secure) where the VSwitch is left at default to Reject.
Edit > Security > Policy Exceptions
Uncheck Promiscuous Mode
Edit > Security > Policy Exceptions
Check Promiscuous Mode and select Accept.
ESX Physical NIC Failure Fails to Trigger Failover¶
Self-demotion in CARP relies on the loss of link on a switch port. As such, if a primary and secondary firewall instance are on separate ESX units and the primary unit loses a switch port link and does not expose that to the VM, CARP will stay MASTER on all of its VIPs there and the secondary will also believe it should be MASTER. One way around this is to script an event in ESX that will take down the switch port on the VM if the physical port loses link. There may be other ways around this in ESX as well.
If using VMware workstation on Linux for testing/modeling and CARP does not function, it is likely because VMware workstation is running non-root and cannot set the vmnet adapter in Promiscuous mode.
The permissions on
/dev/vmnet* should be changed such that the user
running VMware workstation is allowed to modify the
devices. See the VMware KB for
To make the change permanent, edit /etc/init.d/vmware, and in function vmwareStartVmnet(), add commands to chgrp and chown the vmnet devices to a group which contains user running VMware Workstation.
Be sure to use e1000 NICs (em(4)), not the ed(4) NICs or CARP VIPs will never leave init state.