Cryptographic Accelerator Support

Cryptographic acceleration is available on some platforms, typically on hardware that has it available in the CPU like AES-NI, or built into the board such as the ones used on Netgate ARM-based systems. Most cryptographic accelerator hardware supported by FreeBSD will work, provided the drivers are in the kernel or available as loadable modules.

Note

Some modules and hardware are only supported by pfSense® Plus software.

Supported Devices

Currently supported cryptographic accelerator devices include:

AES-NI

Supported natively by most modern CPUs.

IPsec Multi-Buffer (IPsec-MB, IIMB) Cryptographic Acceleration [Plus only]

The Intel® Multi-Buffer Crypto for IPsec Library, often shortened to IPsec-MB or IIMB.

IPsec-MB assists VPN performance by replacing the cryptographic functions provided by the kernel for AES-CBC, AES-GCM, and ChaCha20-Poly1305 with accelerated functions that utilize the optimal CPU SIMD instruction set (Single Instruction, Multiple Data), such as SSE, AVX, AVX2, and AVX512 (Advanced Vector Extensions). As such, this feature requires a CPU which supports one or more of these features, but they are common on current hardware.

This offers faster speeds and lower CPU utilization not only for IPsec but for any VPN utilizing the accelerated algorithms in the kernel. In addition to IPsec this also includes OpenVPN DCO and WireGuard. Exact performance varies by hardware, workload, and available CPU instruction sets. IPsec-MB is faster than AES-NI and can even meet or exceed the performance of dedicated acceleration hardware such as QAT on current versions of pfSense software.

IPsec-MB can be loaded alongside other cryptographic modules without conflicting, so it is separate from the other options. That said, when it is enabled it will take over acceleration of all its supported algorithms even if other options could potentially be faster (e.g. QAT).

There are several aspects of IPsec-MB behavior which can be fine-tuned. See Tuning IPsec-MB for details.

Intel QuickAssist Technology (QAT) [Plus only]

Intel QuickAssist Technology (QAT) accelerates many types of AES and SHA operations, such as AES-GCM encryption, and QAT is ideal for use with IPsec and OpenVPN DCO. It is currently the fastest acceleration option for the algorithms it supports.

QAT devices are supported on certain Intel-based platforms such as select models of c3000 and c2000 SoCs, and also by QAT add-on cards. Several Netgate hardware models include QAT devices, such as the 4100, 5100, 6100, 7100, 8200, and more.

CESA [Plus only]

Present on some ARM platforms such as the Netgate 3100.

SafeXcel [Plus only]

Present on some ARM platforms such as the Netgate 2100 and 1100.

Note

For specifics on which hardware accelerators are available on Netgate hardware, and relevant performance data, visit the Netgate Store.

Activating the Hardware

Some hardware acceleration is active at all times and there is no way to disable it short of removing the crypto card if it is a hardware add-on. For example, CESA acceleration cannot be disabled because it’s an integrated feature of the system and the drivers are present the kernel.

Others, such as QAT, IPsec-MB, AES-NI, or SafeXcel require choosing the appropriate module under System > Advanced on the Miscellaneous tab (See Cryptographic & Thermal Hardware). Choose the appropriate module to match the hardware for Cryptographic Hardware and then Save. The module will be loaded and available immediately.

To deactivate a loaded module, select None for Cryptographic Hardware, Save, and then reboot the system.

Confirming Accelerator Use

Confirming that the cryptographic acceleration device is being used by the firewall can be tricky, depending on the hardware in question.

Most often the evidence of cryptographic accelerator use is apparent in one or more of the following observations:

  • Increased VPN throughput

  • Decreased system load (e.g. CPU utilization) for similar levels of VPN throughput

In cases where it is not clear, some cryptographic accelerators show signs of use by checking for interrupt activity on the device using vmstat -i | grep <name>, where <name> corresponds to the name of the device:

QAT

Use the shell command vmstat -i | grep qat

CESA

Use the shell command vmstat -i | grep cesa

SafeXcel

Use the shell command vmstat -i | grep safexcel

In each of these cases, first check that there is any output at all. If the device has not been used at all since the firewall last rebooted or loaded the device driver, there will be no output from the command.

Note

To see if the driver is loaded, check kldstat -v | grep <name> to ensure the driver is present, and check dmesg | grep <name> to see if the device was detected.

If there is output from vmstat -i for the device, check the third entry on the line, which is the total number of interrupts observed on the device(s). If this number is increasing with VPN activity, the device is being used by the firewall. For example:

# vmstat -i | grep qat
irq300: qat0                     5481147          3

In that output the 5481147 number represents the number of interrupts on the qat0 device. Run the command again after transferring data across the VPN, and compare the number.

Note

If the command produces no output at all, the device is not being used or the device driver is not loaded.

Practical Use

IPsec

IPsec will take advantage of acceleration automatically when an active accelerator supports the cipher chosen for a tunnel. For QAT and AES-NI, the optimal cipher choice is AES-GCM.

OpenVPN

To take advantage of acceleration in OpenVPN, choose a cipher which is supported by the available acceleration hardware, such as AES-256-GCM.

When using OpenVPN in DCO mode on pfSense Plus software, OpenVPN can use QAT or IPsec-MB to accelerate its encryption automatically, assuming the features are enabled (QAT and/or IPsec-MB modules are loaded and active on supported hardware). If the hardware does not support QAT or IPsec-MB, but it does support AES-NI, then ensure the AES-NI module is loaded or DCO mode cannot use AES-NI.

In non-DCO mode, such as on pfSense CE, nothing needs selected for OpenVPN to utilize AES-NI. The OpenSSL engine has its own code for handling AES-NI in this mode that works well without using additional modules.

Tuning IPsec-MB

The behavior of IPsec-MB can be tuned by using one of several system tunables configurable on the System Tunables:

kern.crypto.iimb.enable_aescbc

Enables handling of AES-CBC. IIMB can be slower than QAT for CBC so this is a toggle to disable handling for AES-CBC while accelerating other algorithms so IPsec-MB and QAT can coexist in such environments. Supported on x86-64 only.

Default is enabled (1). To disable, set a value of 0.

kern.crypto.iimb.enable_multiq

Uses multiple queues to handle encryption jobs, i.e. each session is bound to a job thread. There are only a small number of job threads available:

  • 1 thread for < 4 CPUs

  • 2 threads for < 8 CPUs

  • 4 threads for >= 8 CPUs

Default is 1.

kern.crypto.iimb.use_task (default 0)

Use a separate taskq for running the encryption job completion callbacks. The callbacks are functions in the VPN code that send the packets onto the next step, e.g. ip_output or netisr_queue for input into the local stack. This option helps on high-performance systems (fast CPU, fast NICs).

Default is disabled (0). To enable, set a value of 1.

Developer Tunables

The following tunables are typically only needed during development or debugging:

kern.crypto.iimb.arch

Used to override the SIMD architecture on x86. By default it uses the best one available on the CPU. This option allows comparing different CPU feature benchmarks.

Options: auto (default), sse, avx, avx2, avx512.

kern.crypto.iimb.prefetch

Pre-fetch encryption keys before calling the crypto function, this might help with micro-performance, but thus far has not led to any significant measurable differences.

Default is enabled (1). To disable, set a value of 0.

kern.crypto.iimb.max_jobs

Maximum number of batched jobs. The IIMB thread will collect up to this many jobs and handle them in a batch. Maximum is 256, but it can be tuned to be smaller. Limited to 256 because thus far developers have not seen larger values lead to any significant measurable differences in performance, because it adds latency to the network stack.

Default value is 256.