Troubleshooting High Availability Clusters in Virtual Environments¶
Hypervisor users (Especially VMware ESX/ESXi)¶
The below settings are specifically for VMware ESX/ESXi but similar settings may be present on Hyper-V, VirtualBox, and other similar hypervisors.
Note
These notes all apply to CARP VIPs in multicast mode. Unicast mode CARP on pfSense Plus software may not require these settings, but experiences may vary by hypervisor and environment.
Enable promiscuous mode on the vSwitch
Enable MAC Address changes
Enable Forged transmits
If multiple physical ports exist on the same vswitch, the
Net.ReversePathFwdCheckPromisc
option must be enabled to work around a vswitch bug where multicast traffic will loop back to the host, causing CARP to not function with “link states coalesced” messages. (See below)
ESX VDS Promisc Workaround¶
If a Virtual Distributed Switch is in use, a port group can be made for the firewall interfaces with promiscuous mode enabled, and a separate non-promiscuous port group may be used for other hosts. This has been reported to work by users on the forum as a way to strike a balance between the requirements for letting CARP function and for securing client ports.
ESX VDS Upgrade Issue¶
If a VDS (Virtual Distributed Switches) is used in ESX 4.0 or 4.1 and an upgrade from 4.0 to 4.1 or 5.0 is performed, the VDS will not properly pass CARP traffic. If a new VDS is created on 4.1 or 5.0, it will work, but the upgraded VDS will not.
It is reported that disabling promiscuous mode on the VDS and then re-enabling it will resolve the issue.
ESX VDS Port Mirroring Issue¶
If port mirroring is enabled on a VDS, it will break promiscuous mode. To fix it, disable promiscuous mode, then re-enable promiscuous mode.
Client Port Issues¶
If a bare metal HA cluster is connected to a switch with an ESX host using multiple ports on the ESX host (lagg group or similar), and only certain devices or IP addresses are reachable by the target VM, then the port group settings in ESX may need adjusted to set the load balancing for the group to hash based on IP address, not the originating interface.
Side effects of having that set incorrectly include:
Traffic only reaching the target VM in promisc mode on its NIC
Inability to reach the CARP VIP from the target VM when the “real” IP address of the primary firewall is reachable
Port forwards or other inbound connections to the target VM work from some IP addresses and not others.
Changing Net.ReversePathFwdCheckPromisc¶
Login VMware vSphere Client
For each VMware host
Click on host to configure and select the Configuration Tab
Click Software Advanced Settings in left pane
Click on Net and scroll down to Net.ReversePathFwdCheckPromisc and set to
1
Click OK
Promiscuous Mode interfaces need to be set now or toggled off and then back on. This is done per host by clicking Networking in the Hardware section
For each vSwitch and/or Virtual Machine Port Group:
Note
If Promiscuous is already enabled it must be disabled, saved and then re-enabled and saved again.
Click on Properties of the vSwtich
By Default Promiscuous Mode is Reject.
Click the Edit > Security Tab
Select Accept from the drop down
Click OK
However, this setting is usually applied per Virtual Machine Port Group (More Secure) where the VSwitch is left at default to Reject.
Navigate to Edit > Security > Policy Exceptions
Uncheck Promiscuous Mode
Click OK
Navigate to Edit > Security > Policy Exceptions
Check Promiscuous Mode and select Accept.
ESX Physical NIC Failure Fails to Trigger Failover¶
Self-demotion of a CARP VIP relies on the loss of link on a switch port. As such, if a primary and secondary node instance are on separate ESX host and the primary ESX host loses a switch port link and does not expose that to the VM, CARP will stay MASTER on all of its VIPs and the secondary will also believe it should be MASTER. One way around this is to script an event in ESX that will take down the switch port on the VM if the physical port loses link. There may be other ways around this in ESX as well.
VMware Workstation¶
If using VMware workstation on Linux for testing/modeling and CARP failover does not function, it is likely because VMware workstation is running non-root and cannot set the vmnet adapter in Promiscuous mode.
The permissions on /dev/vmnet*
should be changed such that the user running
VMware workstation is allowed to modify the /dev/vmnet*
devices. See the
VMware KB for details.
To make the change permanent, edit /etc/init.d/vmware
, and in function
vmwareStartVmnet()
, add commands to chgrp
and chown
the vmnet
devices to a group which contains user running VMware Workstation.
KVM+QEMU Issues¶
Be sure to use vtnet or e1000 NICs (em(4)
), not the ed(4)
NICs or CARP
VIPs will never leave init state.
VirtualBox Issues¶
From this thread:
Setting Promiscuous mode: Allow All on the relevant interfaces of the VM allows CARP to function on any interface type (Bridged, Host-Only, Internal)