Verifying Failover Functionality

Since the goal of HA is high availability, thorough testing before placing a cluster into production is a must. The most important part of that testing is making sure that the HA peers will failover gracefully during outages.

If any actions in this section do not work as expected, see Troubleshooting High Availability.

Check CARP status

On both nodes, navigate to Status > CARP (failover). If everything is working correctly, the primary will show fa-play-circle MASTER for the status of all CARP VIPs and the secondary will show fa-pause-circle BACKUP.

../_images/ha-status-pri.png

HA CARP and State Synchronization Status (Primary Node)

If either node shows DISABLED, click the Enable CARP button, then refresh the page.

If an interface shows fa-question-circle INIT, it means the interface containing the CARP VIP does not have a link. Connect the interface to a switch, or at least to the other node. If the interface will not be used for some time, remove the CARP VIP from the interface as this will interfere with normal CARP operation.

Check State Synchronization

The Status > CARP page includes State Synchronization Status which lists Filter Host ID values for entries in the state table. If the Filter Host ID in the High Availability settings has been changed recently, it may show both old and new values from the primary and secondary nodes. Over time the list should only reflect the current values of the Filter Host ID of each node in the cluster.

If the lists are identical or nearly identical, then state synchronization is working. If the list does not contain an entry for the Filter Host ID of the other node, then states are not being synchronized.

Check Configuration Replication

Navigate to key locations on the secondary node, such as Firewall > Rules and Firewall > NAT and ensure that rules created only on the primary node are being replicated to the secondary node.

If the example earlier in this chapter was followed, the “temp” firewall rule on the pfsync interface would be replaced by the rule from the primary.

Check DHCP Failover Status

If DHCP failover was configured, its status can be checked at Status > DHCP Leases and Status > DHCPv6 Leases.

The exact appearance of the status depends on the DHCP service backend

Kea DHCP Failover Status

Failover status is in a section at the bottom of the DHCP and DHCPv6 Leases pages as in figure Kea DHCP Failover Status - Primary note, both online.. The failover status works identically for both DHCP and DHCPv6.

../_images/ha-dhcp-kea-status-pri-good.png

Kea DHCP Failover Status - Primary note, both online.

See also

See High Availability Status – Kea DHCP Only for more details about this section.

ISC DHCP Failover Status

Failover status is in a section at the top of the DHCP Leases page. This section contains the status of all DHCP Failover pools, as in Figure ISC DHCP Failover Pool Status.

../_images/ha-dhcp-isc-status.png

ISC DHCP Failover Pool Status

See also

See Pool Status (HA/Failover) – ISC DHCP Only for more details about this section.

Test CARP Failover

Now for the real failover test. Before starting, make sure that a local client behind the CARP pair on LAN can connect to the Internet with both nodes online and running. Once that is confirmed to work, it is an excellent time to make a backup.

For the actual test, unplug the primary node from the network or shut it down temporarily. The client will be able to keep loading content from the Internet through the secondary node. Check Status > CARP (failover) again on the backup and it will now report that it is MASTER for the LAN and WAN CARP VIPs.

Now bring the primary node back online and it will regain its role as MASTER, and the backup system will demote itself to BACKUP once again. At any point during this process, Internet connectivity will still work properly.

Test the HA pair in as many failure scenarios as possible. Additional tests include:

  • Unplug the WAN or LAN cable

  • Pull the power plug of the primary

  • Disable CARP on the primary using both the temporary disable feature and maintenance mode

  • Test with each system individually (power off secondary, then power back on and shut down the primary)

  • Download a file or try streaming audio/video during the failover

  • Run a continuous ICMP echo request (ping) to an Internet host during the failover