Memory Usage and Tuning¶
The TNSR dataplane consumes memory for a variety of reasons, and as one might expect, memory requirements increase depending on the workload.
For the sake of maximum speed, the dataplane will crash when it runs out of memory rather than performing checks and calculations each time it attempts to allocate additional memory. Since that is not a desirable outcome in production, the best practice is to determine the proper memory needs before deploying which also includes testing in a simulated workload comparable to the real production environment.
This document serves as a guide for determining how much memory the dataplane will use in a variety of scenarios as well as testing to determine if the chosen sizes are sufficient for a given workload.
The default values are sufficient in cases where there are a small number of routes in the routing table (e.g. less than 10,000) and for some cases above that level as well. Tuning is primarily required for environments where the router will have over 100,000 routes in the routing table, but the specific level depends on the TNSR configuration, hardware, and environment.
Tip
If there is any uncertainty, the testing procedures laid out in this document can help determine if tuning is necessary. See Testing and Validating Memory Requirements
This document covers memory tuning but there are also CPU usage concerns, especially when using large numbers of routes with dynamic routing. See Working with Large BGP Tables for details and CPU Workers and Affinity for information on configuring additional CPU workers.
Page Size¶
The default memory page size in Linux is 4 kilobytes, which can lead to delays in large memory allocations as it has to work with small chunks of memory at a time. The current best practice is to use a page size of 2 megabytes instead:
tnsr(config)# dataplane memory main-heap-page-size 2m
See also
Tip
For environments with large RAM requirements (large volumes of routes, NAT sessions, etc), in addition to increasing the page size, consider also increasing the main heap size (Main Heap Memory Sizing, Memory) and huge pages allocations (Host Memory Management Configuration).
Routing¶
When handling large numbers of routes in the TNSR FIB, typically from BGP peers, there are multiple considerations when calculating the correct memory size parameters. These include:
Number of worker threads
Number of routes
Address family of routes (IPv4 or IPv6)
Prefix length of IPv4 routes
These are explained in more detail in the next sections.
The primary values which may need adjusted are:
Statistics segment memory size, which holds counters for values in the route tables.
Main heap memory size, which holds the actual routing tables.
Linux-cp netlink socket buffer size, which exchanges routes between the dataplane and operating system.
Statistics Segment Memory Sizing¶
Statistics segment memory usage increases proportionally for each worker thread because each worker thread maintains its own separate counters. This means that the total amount of memory allocated to the statistics segment is divided equally between all workers. Therefore, any increase in worker threads must be accompanied by a corresponding increase in statistics segment memory size to handle the same number of routes.
As mentioned in Statistics Segment the formula for calculating a ballpark
value for the statistics segment memory size is <routes> * <threads> * 2
counters * 16 Bytes
. While that is a good baseline value, the table in
Maximum Route Counts by Statistics Segment Size and Number of Workers was created from
simulated load testing (Testing and Validating Memory Requirements) that is closer
to real-world experience and can be used as a guide to choose an appropriate
statistics segment memory size for a given number of workers and expected total
number of routes.
Workers |
Statistics Segment Size |
|||||
---|---|---|---|---|---|---|
96 MB* |
128 MB |
256 MB |
512 MB |
1 GB |
2 GB |
|
0 |
1.8M |
1.8M |
4.5M |
9.1M |
15.7M |
|
1 |
895.1K |
1.0M |
2.1M |
4.7M |
10.2M |
17.7M |
2 |
578.3K |
827.5K |
1.6M |
3.2M |
7.2M |
14.6M |
3 |
474.7K |
631.7K |
1.3M |
3.0M |
4.8M |
10.6M |
4 |
420.9K |
497.9K |
1.1M |
2.2M |
4.4M |
8.5M |
5 |
324.8K |
487.2K |
907.7K |
1.9M |
3.6M |
6.9M |
6 |
300.0K |
421.7K |
809.2K |
1.5M |
3.3M |
6.3M |
Note
*
denotes the default allocation size.
Example¶
For example, say a router will use 4
worker threads and wants to use a full
BGP feed from an upstream peer. As of this writing a full BGP feed may consist
of approximately 900,000
IPv4 prefixes and 140,000
IPv6 prefixes for a
total of around 1,040,000
routes. These numbers are rounded up a bit to give
some extra headroom for expansion, and should likely be increased further. If a
router needs to handle approximately 1.1M routes with 4 workers, it will need
a minimum of 256MB
allocated to the statistics segment:
tnsr(config)# dataplane statseg heap-size 256M
tnsr(config)# service dataplane restart
Main Heap Memory Sizing¶
Dataplane main heap memory usage for routes in the IPv4 and IPv6 FIBs is not impacted by adding worker threads as there is only a single copy of each FIB in memory.
IPv4 FIB memory usage varies more than statistics segment memory usage. Since it uses the main heap, memory which is dynamically allocated for other objects in VPP at runtime can impact the amount of memory that can be used to stored routes in the FIB. IPv6 FIB memory usage also varies more than the statistics segment, but less than IPv4 FIB.
IPv4 FIB memory usage varies based on the length of the prefix. This is due to
the design of the data structure which is used to store IPv4 routes. Routes with
longer masks can cause more memory to be allocated than routes with shorter
masks. For example, storing a /25
prefix requires more memory to be
allocated than storing a /24
prefix. IPv6 FIB memory usage is not affected
by the length of a prefix. There was no difference in memory usage between IPv6
routes with different mask lengths.
Given those factors, the tables Maximum IPv4 Route Counts by Heap Size and Prefix Length and Maximum IPv6 Route Counts by Heap Size can aid in determining a minimum main heap size which can accommodate the desired number of routes in the FIB.
Prefix length |
Main Heap Size |
|||||
---|---|---|---|---|---|---|
1 GB* |
2 GB |
4 GB |
6 GB |
8 GB |
10 GB |
|
<= 24 |
2.25M |
4.10M |
8.86M |
13.16M |
17.87M |
25.11M |
25 |
403k |
984k |
1.84M |
2.78M |
4.19M |
4.13M |
26 |
719k |
1.62M |
3.67M |
5.47M |
8.24M |
8.40M |
27 |
1.47M |
2.21M |
4.89M |
10.38M |
10.96M |
10.96M |
28 |
1.75M |
3.53M |
6.48M |
11.99M |
14.91M |
22.03M |
29 |
2.05M |
3.92M |
9.66M |
13.06M |
16.71M |
22.44M |
Main Heap |
1 GB* |
2 GB |
4 GB |
6 GB |
8 GB |
10 GB |
---|---|---|---|---|---|---|
IPv6 Routes |
2.05M |
3.76M |
7.47M |
11.52M |
15.57M |
22.21M |
Note
*
denotes the default allocation size.
Tip
As mentioned in Memory, increasing the main heap size beyond the default huge page allocation of 2GB may require increasing huge pages as well. See Host Memory Management Configuration for details.
Also consider increasing the page size to avoid delays in memory allocation. See Page Size for details.
Example¶
Continuing the previous example of 900,000
IPv4 prefixes and 140,000
IPv6 prefixes, going by the worst case scenario of every IPv4 route being a
/25
, that translates to approximately 4GB of main heap for IPv4 and 1GB for
IPv6. Since other parts of the dataplane consume main heap memory as well, 6GB
is a reasonable minimum for that scenario:
tnsr(config)# dataplane memory main-heap-size 6G
tnsr(config)# service dataplane restart
Linux-cp Netlink Socket Buffer Sizing¶
Dataplane memory allocation is one part of handling large numbers of routes, but the routes must also be passed between the operating system FIB and the dataplane. For example, this is the mechanism which exchanges routes between the dynamic routing daemon (FRR) and the dataplane.
When dealing with large numbers of routes received in a short time frame, the netlink socket buffer used to exchange these routes may be overrun. If this happens, routes may be lost which leads to a mismatch between the operating system and dataplane FIBs.
The default size of the netlink socket buffer is 128MB
which is typically
sufficient for around 2M routes but varies depending on hardware and other
aspects of the configuration and environment.
Unlike the values discussed earlier in this document, the requirements for the netlink socket buffer are not consistent enough to create tables from which a value can be determined. Configure the other values appropriately and then continue on to Testing and Validating Memory Requirements. The validation process for the netlink socket buffer size is explained in that section. If that process determines the size is insufficient, increase it until the tests no longer fail.
Example¶
Though the example used so far would likely work within the default value, this
will double the default to 256MB so there is room to spare. The size value in
this command is specified in bytes, so multiply 256*1024*1024
for a total of
268435456
bytes.
tnsr(config)# dataplane linux-cp nl-rx-buffer-size 268435456
tnsr(config)# service dataplane restart
Testing and Validating Memory Requirements¶
TNSR includes a route testing utility at /usr/bin/route-test
. This utility
adds IPv4 or IPv6 routes quickly via netlink, which is the same method used by
the dynamic routing daemon (FRR/zebra) to add routes it receives via BGP.
This utility can aid in validating memory parameters and help in tuning
linux-cp
parameters such as the netlink socket buffer size
(Linux-cp Configuration).
For IPv4 routes, the default behavior of the utility is to add /24
routes
sequentially starting at 1.0.0.0/24
. It skips the loopback prefix
(127/8
) and the prefix which contains the gateway address used with the
routes. It stops when it reaches the end of multicast address space (224/8
).
For IPv6 routes, the default behavior is to add /64
routes sequentially
starting at 2000::/64
. It skips the prefix which contains the gateway
address used with the routes and stops when it reaches the end of global unicast
address space (4000::/3
).
After selecting appropriate sizes for the statistics segment
(Statistics Segment Memory Sizing) and main heap
(Main Heap Memory Sizing) based on the tables in those sections,
use route-test
to add the expected number of routes. This process will
validate that the memory allocations are sufficient to support that number of
routes.
Route Test Utility Usage¶
The syntax for this utility is:
# /usr/bin/route-test -h
/usr/bin/route-test -g <gateway_address> -n <num_routes> [-h] [-6] [-l <len>]
-h - Display this message
-6 - Add IPv6 routes (IPv4 by default)
-n <number_of_routes>
-g <gateway_address>
-l <prefix_length>
To use the utility, supply a gateway address and a number of routes to add. For
example, the following command will add 1M routes which use 198.51.100.2
as
the next-hop/gateway address.
$ sudo dp-exec route-test -g 198.51.100.2 -n 1000000
Note
For the routes to be added successfully, TNSR must be configured so that the
next hop address can be resolved. In this example, TNSR must know how to
reach 198.51.100.2
. This could be accomplished by configuring
198.51.100.1/24
on an interface and bringing it up.
The routes are added to the linux kernel route table via netlink, thus the
program must be run as a privileged user, which is why the example command is
run via sudo
. Alternately, it could be run in a root
shell without
sudo
.
The utility must be run in the dataplane network namespace for the routes to be
added to the dataplane FIB by the linux-nl
plugin, which is the reason to
run it using dp-exec
(Namespaces in Shell Commands). The dp-exec
command
can be omitted by opening a shell in the dataplane namespace from the TNSR CLI:
tnsr# dataplane shell sudo bash
# route-test -g 198.51.100.2 -n 1000000
Or:
tnsr# dataplane shell sudo route-test -g 198.51.100.2 -n 1000000
The utility adds /24
routes by default for IPv4. There are a finite number
of unicast /24
prefixes available (around 14M) as shown in
Counts of unicast prefixes. Routes with other prefix lengths can be
added via the -l <len>
argument. The argument -l 25
will instruct the
utility to add /25
routes (1.0.0.0/25
, 1.0.0.128/25
, 1.0.1.0/25
,
etc.) instead.
Prefix length |
Available unicast prefixes |
---|---|
24 |
14.54M |
23 |
7.27M |
22 |
3.63M |
21 |
1.81M |
20 |
909k |
19 |
454k |
18 |
227k |
17 |
113k |
16 |
56k |
15 |
28k |
14 |
14k |
13 |
7k |
12 |
3551 |
11 |
1775 |
10 |
887 |
9 |
443 |
8 |
221 |
If an expected distribution of routes is known by prefix length (e.g. 2M total
routes will be comprised of 1M /24
, 500k /23
, 250k /22
, 250k
/21
), the program can be run several times in succession with different
values of -l <len>
to simulate that distribution. This is a valuable
exercise due to the way data is structured in the main heap to optimize the
speed of FIB lookups. Routes with higher prefix length may consume more memory
on the main heap than routes with a lower length. For example, a /27
route
may cause additional memory to be consumed beyond what is required for a /24
route. This behavior does not apply to the statistics segment or IPv6 routes, it
only applies to IPv4 routes in the main heap.
Tip
The best practice is to validate memory parameters using a distribution
similar to what will be seen in production use if that data is available, or
to use the worst case. If /27
routes are the longest prefix length
expected to be received via BGP, use -l 27
to add /27
routes in order
to the memory allocations.
Interpreting Test Results¶
When all iterations of route-test
are complete, validate that routes were
added to the FIB by running sudo vppctl show ip fib summary
from a shell.
This will display the counts of IPv4 routes of each length. sudo vppctl show
ip6 fib summary
shows similar statistics for IPv6 routes, though memory
consumption is not tied to prefix length for IPv6 routes the way it is for IPv4
routes.
If the dataplane (VPP) crashes while running route-test
, add 25% to the size
of the main heap (Main Heap Memory Sizing) and statistics segment
(Statistics Segment Memory Sizing) and repeat the test.
In addition to testing memory allocation, running this tool also exercises the
Linux-cp netlink socket buffer. If sudo vppctl show ip fib summary
or sudo
vppctl show ip6 fib summary
shows a lower count of routes than requested
during the test, the netlink socket buffer may have overflowed and the kernel
may have had to drop some of the route announcements it was trying to send and
the socket buffer size may need to be increased.
In addition to checking the route counts, check the logs using sudo vppctl
show log
and by inspecting the contents of /var/log/messages
for error
messages about the socket overflowing.
If the socket overflows during the tests, increase the size of the socket buffer (Linux-cp Netlink Socket Buffer Sizing).
NAT¶
Increasing the number of NAT sessions per thread (NAT Sizing Options) requires additional increases in main heap memory based on the number of worker threads and NAT mode (NAT Modes).
The amount of memory consumed per session depends on the NAT mode. Endpoint-dependent NAT mode consumes slightly more memory per session than endpoint-independent mode. The memory consumed per session increases in a linear manner as session limits increase, with each session consuming approximately the same amount of memory on average:
Endpoint-independent NAT mode:
228
Bytes per sessionEndpoint-dependent NAT mode:
353
Bytes per session
Multiply the value for the NAT mode by the max-translations-per-thread
NAT
configuration value and the number of worker threads to reach a minimum safe
starting value for the amount of memory required by NAT in the main heap.
<nat mode session size> * <max-translations-per-thread> * <workers>
The table NAT Memory by Sessions per Thread and NAT Mode below has memory usage values based on several single-thread session counts for easy estimation.
Translations |
NAT44 EI Mode |
NAT44 ED Mode |
---|---|---|
128,000 |
29.2 M |
45.2 M |
256,000 |
58.4 M |
90.4 M |
512,000 |
116.7 M |
180.7 M |
1,000,000 |
228.0 M |
353.0 M |
2,000,000 |
456.0 M |
706.0 M |
4,000,000 |
912.0 M |
1412.0 M |
Note
This calculation only accounts for NAT. The main thread itself uses memory plus the routing table size increases main heap memory usage. Thus, the actual requirement is likely to be higher than this calculated minimum.
An alternate tactic to reduce maximum session requirements and associated memory requirements is to reduce the NAT session timeout. Shorter sessions are removed from memory faster than longer sessions, and thus are less likely to exist concurrently with other sessions. The exact values depend upon the environment and types of connections passing through TNSR. See NAT Session Timeout Duration for details on the various timer values.
API Segment¶
In high volume environments with large amounts of route changes in a short time frame, it may be necessary to increase the amount of RAM the system dedicates to messages for the internal binary API (API Segment).
The API segment defaults are currently 64M
for the global size and 16M
for the API size. The global size must be larger than the API size, so when
increasing the API size, increase the global size in a similar fashion.
For example, to increase
tnsr(config)# dataplane api-segment global-size 512M
tnsr(config)# dataplane api-segment api-size 256M
tnsr(config)# service dataplane restart