Troubleshooting High Availability Clusters in Virtual Environments

Hypervisor users (Especially VMware ESX/ESXi)

The below settings are specifically for VMware ESX/ESXi but similar settings may be present on Hyper-V, VirtualBox, and other similar hypervisors.

  1. Enable promiscuous mode on the vSwitch

  2. Enable MAC Address changes

  3. Enable Forged transmits

  4. If multiple physical ports exist on the same vswitch, the Net.ReversePathFwdCheckPromisc option must be enabled to work around a vswitch bug where multicast traffic will loop back to the host, causing CARP to not function with “link states coalesced” messages. (See below)

ESX VDS Promisc Workaround

If a Virtual Distributed Switch is in use, a port group can be made for the firewall interfaces with promiscuous mode enabled, and a separate non-promiscuous port group may be used for other hosts. This has been reported to work by users on the forum as a way to strike a balance between the requirements for letting CARP function and for securing client ports.

ESX VDS Upgrade Issue

If a VDS (Virtual Distributed Switches) is used in ESX 4.0 or 4.1 and an upgrade from 4.0 to 4.1 or 5.0 is performed, the VDS will not properly pass CARP traffic. If a new VDS is created on 4.1 or 5.0, it will work, but the upgraded VDS will not.

It is reported that disabling promiscuous mode on the VDS and then re-enabling it will resolve the issue.

ESX VDS Port Mirroring Issue

If port mirroring is enabled on a VDS, it will break promiscuous mode. To fix it, disable promiscuous mode, then re-enable promiscuous mode.

Client Port Issues

If a physical CARP cluster is connected to a switch with an ESX box using multiple ports on the ESX box (lagg group or similar), and only certain devices/IPs are reachable by the target VM, then the port group settings in ESX may need adjusted to set the load balancing for the group to hash based on IP, not the originating interface.

Side effects of having that set incorrectly include:

  • Traffic only reaching the target VM in promisc mode on its NIC

  • Inability to reach the CARP IP from the target VM when the “real” IP of the primary firewall is reachable

  • Port forwards or other inbound connections to the target VM work from some IPs and not others.

Changing Net.ReversePathFwdCheckPromisc

Login VMware vSphere Client

For each VMware host

  • Click on host to configure and select Configuration Tab

  • Click Software Advanced Settings in left pane

  • Click on Net and scroll down to Net.ReversePathFwdCheckPromisc and set to 1

  • Click OK

Promiscuous Mode interfaces need to be set now or twiddled off and then back on. This is done per host by clicking Networking in the Hardware section

  • For each vSwitch and/or Virtual Machine Port Group.

    • NOTE: If Promiscuous is already enabled it must be disabled, saved and then re-enabled, saved.

    • Click on Properties of vSwtich

    • By Default Promiscuous Mode is Reject.

    • To Change click Edit > Security Tab

    • Select Accept from drop down

    • Click OK.

  • However, this setting is usually applied per Virtual Machine Port Group (More Secure) where the VSwitch is left at default to Reject.

    • Edit > Security > Policy Exceptions

    • Uncheck Promiscuous Mode

    • Click OK

    • Edit > Security > Policy Exceptions

    • Check Promiscuous Mode and select Accept.

More information available from VMware

ESX Physical NIC Failure Fails to Trigger Failover

Self-demotion in CARP relies on the loss of link on a switch port. As such, if a primary and secondary firewall instance are on separate ESX units and the primary unit loses a switch port link and does not expose that to the VM, CARP will stay MASTER on all of its VIPs there and the secondary will also believe it should be MASTER. One way around this is to script an event in ESX that will take down the switch port on the VM if the physical port loses link. There may be other ways around this in ESX as well.

VMware Workstation

If using VMware workstation on Linux for testing/modeling and CARP does not function, it is likely because VMware workstation is running non-root and cannot set the vmnet adapter in Promiscuous mode.

The permissions on /dev/vmnet* should be changed such that the user running VMware workstation is allowed to modify the /dev/vmnet* devices. See the VMware KB for details.

To make the change permanent, edit /etc/init.d/vmware, and in function vmwareStartVmnet(), add commands to chgrp and chown the vmnet devices to a group which contains user running VMware Workstation.


Be sure to use e1000 NICs (em(4)), not the ed(4) NICs or CARP VIPs will never leave init state.

VirtualBox Issues

From this thread:

  • Setting Promiscuous mode: Allow All on the relevant interfaces of the VM allows CARP to function on any interface type (Bridged, Host-Only, Internal)