Memory Usage and Tuning

The TNSR dataplane consumes memory for a variety of reasons, and as one might expect, memory requirements increase depending on the workload.

For the sake of maximum speed, the dataplane will crash when it runs out of memory rather than performing checks and calculations each time it attempts to allocate additional memory. Since that is not a desirable outcome in production, the best practice is to determine the proper memory needs before deploying which also includes testing in a simulated workload comparable to the real production environment.

This document serves as a guide for determining how much memory the dataplane will use in a variety of scenarios as well as testing to determine if the chosen sizes are sufficient for a given workload.

The default values are sufficient in cases where there are a small number of routes in the routing table (e.g. less than 10,000) and for some cases above that level as well. Tuning is primarily required for environments where the router will have over 100,000 routes in the routing table, but the specific level depends on the TNSR configuration, hardware, and environment.

Tip

If there is any uncertainty, the testing procedures laid out in this document can help determine if tuning is necessary. See Testing and Validating Memory Requirements

This document covers memory tuning but there are also CPU usage concerns, especially when using large numbers of routes with dynamic routing. See Working with Large BGP Tables for details and CPU Workers and Affinity for information on configuring additional CPU workers.

Page Size

The default memory page size in Linux is 4 kilobytes, which can lead to delays in large memory allocations as it has to work with small chunks of memory at a time. The current best practice is to use a page size of 2 megabytes instead:

tnsr(config)# dataplane memory main-heap-page-size 2m

See also

Memory

Tip

For environments with large RAM requirements (large volumes of routes, NAT sessions, etc), in addition to increasing the page size, consider also increasing the main heap size (Main Heap Memory Sizing, Memory) and huge pages allocations (Host Memory Management Configuration).

Routing

When handling large numbers of routes in the TNSR FIB, typically from BGP peers, there are multiple considerations when calculating the correct memory size parameters. These include:

  • Number of worker threads

  • Number of routes

  • Address family of routes (IPv4 or IPv6)

  • Prefix length of IPv4 routes

These are explained in more detail in the next sections.

The primary values which may need adjusted are:

  • Statistics segment memory size, which holds counters for values in the route tables.

  • Main heap memory size, which holds the actual routing tables.

  • Linux-cp netlink socket buffer size, which exchanges routes between the dataplane and operating system.

Statistics Segment Memory Sizing

Statistics segment memory usage increases proportionally for each worker thread because each worker thread maintains its own separate counters. This means that the total amount of memory allocated to the statistics segment is divided equally between all workers. Therefore, any increase in worker threads must be accompanied by a corresponding increase in statistics segment memory size to handle the same number of routes.

As mentioned in Statistics Segment the formula for calculating a ballpark value for the statistics segment memory size is <routes> * <threads> * 2 counters * 16 Bytes. While that is a good baseline value, the table in Maximum Route Counts by Statistics Segment Size and Number of Workers was created from simulated load testing (Testing and Validating Memory Requirements) that is closer to real-world experience and can be used as a guide to choose an appropriate statistics segment memory size for a given number of workers and expected total number of routes.

Maximum Route Counts by Statistics Segment Size and Number of Workers

Workers

Statistics Segment Size

96 MB*

128 MB

256 MB

512 MB

1 GB

2 GB

0

1.8M

1.8M

4.5M

9.1M

15.7M

1

895.1K

1.0M

2.1M

4.7M

10.2M

17.7M

2

578.3K

827.5K

1.6M

3.2M

7.2M

14.6M

3

474.7K

631.7K

1.3M

3.0M

4.8M

10.6M

4

420.9K

497.9K

1.1M

2.2M

4.4M

8.5M

5

324.8K

487.2K

907.7K

1.9M

3.6M

6.9M

6

300.0K

421.7K

809.2K

1.5M

3.3M

6.3M

Note

* denotes the default allocation size.

Example

For example, say a router will use 4 worker threads and wants to use a full BGP feed from an upstream peer. As of this writing a full BGP feed may consist of approximately 900,000 IPv4 prefixes and 140,000 IPv6 prefixes for a total of around 1,040,000 routes. These numbers are rounded up a bit to give some extra headroom for expansion, and should likely be increased further. If a router needs to handle approximately 1.1M routes with 4 workers, it will need a minimum of 256MB allocated to the statistics segment:

tnsr(config)# dataplane statseg heap-size 256M
tnsr(config)# service dataplane restart

Main Heap Memory Sizing

Dataplane main heap memory usage for routes in the IPv4 and IPv6 FIBs is not impacted by adding worker threads as there is only a single copy of each FIB in memory.

IPv4 FIB memory usage varies more than statistics segment memory usage. Since it uses the main heap, memory which is dynamically allocated for other objects in VPP at runtime can impact the amount of memory that can be used to stored routes in the FIB. IPv6 FIB memory usage also varies more than the statistics segment, but less than IPv4 FIB.

IPv4 FIB memory usage varies based on the length of the prefix. This is due to the design of the data structure which is used to store IPv4 routes. Routes with longer masks can cause more memory to be allocated than routes with shorter masks. For example, storing a /25 prefix requires more memory to be allocated than storing a /24 prefix. IPv6 FIB memory usage is not affected by the length of a prefix. There was no difference in memory usage between IPv6 routes with different mask lengths.

Given those factors, the tables Maximum IPv4 Route Counts by Heap Size and Prefix Length and Maximum IPv6 Route Counts by Heap Size can aid in determining a minimum main heap size which can accommodate the desired number of routes in the FIB.

Maximum IPv4 Route Counts by Heap Size and Prefix Length

Prefix length

Main Heap Size

1 GB*

2 GB

4 GB

6 GB

8 GB

10 GB

<= 24

2.25M

4.10M

8.86M

13.16M

17.87M

25.11M

25

403k

984k

1.84M

2.78M

4.19M

4.13M

26

719k

1.62M

3.67M

5.47M

8.24M

8.40M

27

1.47M

2.21M

4.89M

10.38M

10.96M

10.96M

28

1.75M

3.53M

6.48M

11.99M

14.91M

22.03M

29

2.05M

3.92M

9.66M

13.06M

16.71M

22.44M

Maximum IPv6 Route Counts by Heap Size

Main Heap

1 GB*

2 GB

4 GB

6 GB

8 GB

10 GB

IPv6 Routes

2.05M

3.76M

7.47M

11.52M

15.57M

22.21M

Note

* denotes the default allocation size.

Tip

As mentioned in Memory, increasing the main heap size beyond the default huge page allocation of 2GB may require increasing huge pages as well. See Host Memory Management Configuration for details.

Also consider increasing the page size to avoid delays in memory allocation. See Page Size for details.

Example

Continuing the previous example of 900,000 IPv4 prefixes and 140,000 IPv6 prefixes, going by the worst case scenario of every IPv4 route being a /25, that translates to approximately 4GB of main heap for IPv4 and 1GB for IPv6. Since other parts of the dataplane consume main heap memory as well, 6GB is a reasonable minimum for that scenario:

tnsr(config)# dataplane memory main-heap-size 6G
tnsr(config)# service dataplane restart

Testing and Validating Memory Requirements

TNSR includes a route testing utility at /usr/bin/route-test. This utility adds IPv4 or IPv6 routes quickly via netlink, which is the same method used by the dynamic routing daemon (FRR/zebra) to add routes it receives via BGP.

This utility can aid in validating memory parameters and help in tuning linux-cp parameters such as the netlink socket buffer size (Linux-cp Configuration).

For IPv4 routes, the default behavior of the utility is to add /24 routes sequentially starting at 1.0.0.0/24. It skips the loopback prefix (127/8) and the prefix which contains the gateway address used with the routes. It stops when it reaches the end of multicast address space (224/8).

For IPv6 routes, the default behavior is to add /64 routes sequentially starting at 2000::/64. It skips the prefix which contains the gateway address used with the routes and stops when it reaches the end of global unicast address space (4000::/3).

After selecting appropriate sizes for the statistics segment (Statistics Segment Memory Sizing) and main heap (Main Heap Memory Sizing) based on the tables in those sections, use route-test to add the expected number of routes. This process will validate that the memory allocations are sufficient to support that number of routes.

Route Test Utility Usage

The syntax for this utility is:

# /usr/bin/route-test -h
/usr/bin/route-test -g <gateway_address> -n <num_routes> [-h] [-6] [-l <len>]
    -h - Display this message
    -6 - Add IPv6 routes (IPv4 by default)
    -n <number_of_routes>
    -g <gateway_address>
    -l <prefix_length>

To use the utility, supply a gateway address and a number of routes to add. For example, the following command will add 1M routes which use 198.51.100.2 as the next-hop/gateway address.

$ sudo dp-exec route-test -g 198.51.100.2 -n 1000000

Note

For the routes to be added successfully, TNSR must be configured so that the next hop address can be resolved. In this example, TNSR must know how to reach 198.51.100.2. This could be accomplished by configuring 198.51.100.1/24 on an interface and bringing it up.

The routes are added to the linux kernel route table via netlink, thus the program must be run as a privileged user, which is why the example command is run via sudo. Alternately, it could be run in a root shell without sudo.

The utility must be run in the dataplane network namespace for the routes to be added to the dataplane FIB by the linux-nl plugin, which is the reason to run it using dp-exec (Namespaces in Shell Commands). The dp-exec command can be omitted by opening a shell in the dataplane namespace from the TNSR CLI:

tnsr# dataplane shell sudo bash
# route-test -g 198.51.100.2 -n 1000000

Or:

tnsr# dataplane shell sudo route-test -g 198.51.100.2 -n 1000000

The utility adds /24 routes by default for IPv4. There are a finite number of unicast /24 prefixes available (around 14M) as shown in Counts of unicast prefixes. Routes with other prefix lengths can be added via the -l <len> argument. The argument -l 25 will instruct the utility to add /25 routes (1.0.0.0/25, 1.0.0.128/25, 1.0.1.0/25, etc.) instead.

Counts of unicast prefixes

Prefix length

Available unicast prefixes

24

14.54M

23

7.27M

22

3.63M

21

1.81M

20

909k

19

454k

18

227k

17

113k

16

56k

15

28k

14

14k

13

7k

12

3551

11

1775

10

887

9

443

8

221

If an expected distribution of routes is known by prefix length (e.g. 2M total routes will be comprised of 1M /24, 500k /23, 250k /22, 250k /21), the program can be run several times in succession with different values of -l <len> to simulate that distribution. This is a valuable exercise due to the way data is structured in the main heap to optimize the speed of FIB lookups. Routes with higher prefix length may consume more memory on the main heap than routes with a lower length. For example, a /27 route may cause additional memory to be consumed beyond what is required for a /24 route. This behavior does not apply to the statistics segment or IPv6 routes, it only applies to IPv4 routes in the main heap.

Tip

The best practice is to validate memory parameters using a distribution similar to what will be seen in production use if that data is available, or to use the worst case. If /27 routes are the longest prefix length expected to be received via BGP, use -l 27 to add /27 routes in order to the memory allocations.

Interpreting Test Results

When all iterations of route-test are complete, validate that routes were added to the FIB by running sudo vppctl show ip fib summary from a shell. This will display the counts of IPv4 routes of each length. sudo vppctl show ip6 fib summary shows similar statistics for IPv6 routes, though memory consumption is not tied to prefix length for IPv6 routes the way it is for IPv4 routes.

If the dataplane (VPP) crashes while running route-test, add 25% to the size of the main heap (Main Heap Memory Sizing) and statistics segment (Statistics Segment Memory Sizing) and repeat the test.

In addition to testing memory allocation, running this tool also exercises the Linux-cp netlink socket buffer. If sudo vppctl show ip fib summary or sudo vppctl show ip6 fib summary shows a lower count of routes than requested during the test, the netlink socket buffer may have overflowed and the kernel may have had to drop some of the route announcements it was trying to send and the socket buffer size may need to be increased.

In addition to checking the route counts, check the logs using sudo vppctl show log and by inspecting the contents of /var/log/messages for error messages about the socket overflowing.

If the socket overflows during the tests, increase the size of the socket buffer (Linux-cp Netlink Socket Buffer Sizing).

NAT

Increasing the number of NAT sessions per thread (NAT Sizing Options) requires additional increases in main heap memory based on the number of worker threads and NAT mode (NAT Modes).

The amount of memory consumed per session depends on the NAT mode. Endpoint-dependent NAT mode consumes slightly more memory per session than endpoint-independent mode. The memory consumed per session increases in a linear manner as session limits increase, with each session consuming approximately the same amount of memory on average:

  • Endpoint-independent NAT mode: 228 Bytes per session

  • Endpoint-dependent NAT mode: 353 Bytes per session

Multiply the value for the NAT mode by the max-translations-per-thread NAT configuration value and the number of worker threads to reach a minimum safe starting value for the amount of memory required by NAT in the main heap.

<nat mode session size> * <max-translations-per-thread> * <workers>

The table NAT Memory by Sessions per Thread and NAT Mode below has memory usage values based on several single-thread session counts for easy estimation.

NAT Memory by Sessions per Thread and NAT Mode

Translations

NAT44 EI Mode

NAT44 ED Mode

128,000

29.2 M

45.2 M

256,000

58.4 M

90.4 M

512,000

116.7 M

180.7 M

1,000,000

228.0 M

353.0 M

2,000,000

456.0 M

706.0 M

4,000,000

912.0 M

1412.0 M

Note

This calculation only accounts for NAT. The main thread itself uses memory plus the routing table size increases main heap memory usage. Thus, the actual requirement is likely to be higher than this calculated minimum.

An alternate tactic to reduce maximum session requirements and associated memory requirements is to reduce the NAT session timeout. Shorter sessions are removed from memory faster than longer sessions, and thus are less likely to exist concurrently with other sessions. The exact values depend upon the environment and types of connections passing through TNSR. See NAT Session Timeout Duration for details on the various timer values.

API Segment

In high volume environments with large amounts of route changes in a short time frame, it may be necessary to increase the amount of RAM the system dedicates to messages for the internal binary API (API Segment).

The API segment defaults are currently 64M for the global size and 16M for the API size. The global size must be larger than the API size, so when increasing the API size, increase the global size in a similar fashion.

For example, to increase

tnsr(config)# dataplane api-segment global-size 512M
tnsr(config)# dataplane api-segment api-size 256M
tnsr(config)# service dataplane restart