Initializing
Overview
The fdctl configure command is used to setup the host operator system so Firedancer can run correctly. It does the following:
- hugetlbfs Reserves huge and gigantic pages for use by Firedancer.
- sysctl Sets required kernel parameters.
- hyperthreads Checks hyperthreaded pair for critical CPU cores.
- ethtool-channels Configures the number of channels on the network device.
- ethtool-offloads Disable generic-receive-offload (GRO) and GRE segmentation offload on the network device.
- ethtool-loopback Disable tx-udp-segmentation on the loopback device.
The hugetlbfs configuration must be performed every time the system is rebooted, to remount the hugetlbfs filesystems, as do sysctl, ethtool-channels and ethtool-offloads to reconfigure the networking device.
The configure command is run like fdctl configure <mode> <stage>... where mode is one of:
initConfigures the provided stages if they are not already configured.checkCheck if each stage is already configured. The command will exit with an error code if they are not.checknever requires privileges and will not make any changes to the system.finiUnconfigure (reverse) the stage if it is reversible.
stage can be one or more of hugetlbfs, sysctl, hyperthreads, ethtool-channels, ethtool-offloads, ethtool-loopback, and snapshots and these stages are described below. You can also use the stage all which will configure everything.
Stages have different privilege requirements, which you can see by trying to run the stage without privileges. The check mode never requires privileges, and the init mode will only require privileges if it needs to actually change something.
hugetlbfs
The hugetlbfs stage is used to reserve huge (2MiB) and gigantic (1GiB) memory pages from the Linux kernel for use by Firedancer. See also the kernel documentation of these pages. Almost all memory in Firedancer is allocated out of these pages for performance reasons.
This is a two step process. First, the number of huge and gigantic pages available on the entire system is increased in the kernel by increasing /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages until the free_hugepages value is high enough for all the memory needs of the validator.
Once the pages have been reserved globally in the kernel pool, they are assigned specifically to Firedancer by creating a hugetlbfs mount at each of /mnt/.fd/.gigantic/ and /mnt/.fd/.huge for gigantic and huge pages respectively. These paths can be configured in the TOML file under the [hugetlbfs] section. Lets run it:
$ sudo fdctl configure init hugetlbfs
NOTICE hugetlbfs ... unconfigured ... mounts `/mnt/.fd/.huge` and `/mnt/.fd/.gigantic` do not exist
NOTICE hugetlbfs ... configuring
NOTICE RUN: `mkdir -p /mnt/.fd/.huge`
NOTICE RUN: `mount -t hugetlbfs none /mnt/.fd/.huge -o pagesize=2097152,min_size=228589568`
NOTICE RUN: `mkdir -p /mnt/.fd/.gigantic`
NOTICE RUN: `mount -t hugetlbfs none /mnt/.fd/.gigantic -o pagesize=1073741824,min_size=27917287424`
$ cat /proc/mounts | grep \\.fd
none /mnt/.fd/.gigantic hugetlbfs rw,seclabel,relatime,pagesize=1024M,min_size=540092137472 0 0
none /mnt/.fd/.huge hugetlbfs rw,seclabel,relatime,pagesize=2M,min_size=95124124 0 0This stage requires root privileges, and cannot be performed with capabilities. If the required hugetlbfs mounts are already present, with at least the amount of memory reserved that we required then the init mode does nothing and the check mode will return successfully without requiring privileges.
The fini mode will unmount the two filesystems, and remove them from /mnt/.fd/, although it will leave the /mnt/.fd/ directory in place. The fini mode will not succeed if memory from the mounts is mapped into a running process.
If fini succeeds, the huge and gigantic pages that Firedancer had reserved will be returned to the kernel global pool so they can be used by other programs, but the global pool size will not be decreased, even if it was earlier increased during init.
TIP
The hugetlbfs step should be run immediately when the system is booted. If run later, it may fail because the operating system memory is fragmented and a large contiguous block cannot be reserved.
sysctl
It is suggested to run Firedancer with certain kernel parameters tuned for best performance. The sysctl stage will check and configure these parameters. The stage will only increase values to meet the minimum, and will not decrease them if the minimum is already met.
| Sysctl | Minimum | Required | Description |
|---|---|---|---|
| /proc/sys/vm/max_map_count | 1000000 | Yes | Agave accounts database requires mapping many files. |
| /proc/sys/fs/file-max | 1024000 | Yes | Agave accounts database requires opening many files. |
| /proc/sys/fs/nr_open | 1024000 | Yes | Agave accounts database requires opening many files. |
| /proc/sys/net/ipv4/conf/lo/rp_filter | 2 | Yes | If sending QUIC transactions to Firedancer over loopback, this must be enabled to receive a response. Otherwise Linux will drop response packets due to limitations in the kernel eBPF networking stack. The sendTransaction RPC call will send over loopback. |
| /proc/sys/net/ipv4/conf/lo/accept_local | 1 | Yes | If sending QUIC transactions to Firedancer over loopback, this must be enabled to receive a response. Otherwise Linux will drop response packets due to limitations in the kernel eBPF networking stack. The sendTransaction RPC call will send over loopback. |
| /proc/sys/net/core/bpf_jit_enable | 1 | No | Firedancer uses BPF for kernel bypass networking. BPF JIT makes this faster. |
| /proc/sys/kernel/numa_balancing | 0 | No | Firedancer assigns all memory to the right NUMA node, and rebalancing will make the system slower. |
Sysctls that are not required will produce a warning if they are not set correctly, but configuration will proceed and exit normally.
The init mode requires either root privileges, or to be run with CAP_SYS_ADMIN. The fini mode does nothing and kernel parameters will never be reduced or changed back as a result of running configure.
hyperthreads
Most work in Firedancer can be scaled with the number of CPU cores, but there are two jobs (tiles) which must run serially on a single core:
- pack Responsible for scheduling transactions for execution when we are leader.
- poh Performs repeated
sha256hashes, and periodically stamps these hashes into in-progress blocks when we are leader.
Because any interruption, context switch, or sharing of the CPU core that these jobs run on could cause skipped leader slots or unfull blocks, Firedancer expects them to get a dedicated core. This means on machines with a hyperthreaded CPU, the hyperthreaded pair of these tiles should be switched to offline.
This stage looks to see if the CPU is hyperthreaded, and will print a warning for the operator if the pair of these tiles are used or online.
A typical warning on a hyperthreaded system with auto layout looks like this:
$ sudo fdctl configure init hyperthreads
WARNING pack cpu 5 has hyperthread pair cpu 29 which should be offline. Proceeding but performance may be reduced.
WARNING poh cpu 9 has hyperthread pair cpu 33 which should be offline. Proceeding but performance may be reduced.When using the auto layout, Firedancer will ensure no other tiles are assigned to run on the hyperthread pairs, but if using a manual layout, it is possible to assign another tile to the pair.
This stage has no dependencies on any other stage, but it is dependent on the topology specified in your configuration. It is recommended that you turn off the CPUs specified in the warning for optimal performance.
ethtool-channels
In addition to XDP, Firedancer uses receive side scaling (RSS) to improve network performance. This uses functionality of modern NICs to steer packets to different queues to distribute processing among CPUs. See the kernel documentation for more information.
In Firedancer, each net tile serves just one network queue, so the ethtool-channels stage will modify the network device [net.interface] configuration such that all packets needed by Firedancer are steered to the proper queue(s). There are three modes, selectable in your configuration, that govern this behavior:
simple mode modifies the combined channel count of the configured network device to be the same as the number of
nettiles,[layout.net_tile_count]. If your NIC does not support the required number of queues, you will need to reduce the number ofnettiles, potentially down to one for NICs which don't support queues at all. This is the default mode and should work for all network devices. Because the queue count is reduced system-wide, not solely for Firedancer, this can have a negative performance impact on non-Firedancer network traffic.dedicated mode reserves a dedicated hardware queue for each
nettile. This is the more advanced mode and may not work with all network devices. By modifying the RXFH indirection table and installing ntuple rules, Firedancer traffic is directed onto the dedicated queues and all other traffic is sharded amongst the rest. This has a performance benefit for both Firedancer and non-Firedancer traffic.auto mode attempts to initialize the device in dedicated mode and automatically falls back to simple mode if any failure occurs.
The command run by the stage in simple mode is similar to running ethtool --set-channels <device> combined <N> but it also supports bonded devices. We can check that it worked:
$ sudo fdctl configure init ethtool-channels
NOTICE ethtool-channels ... unconfigured ... device `ens3f0` does not have right number of channels (got 1 but expected 2)
NOTICE ethtool-channels ... configuring
NOTICE ethtool-channels ... RUN: `ethtool --set-channels ens3f0 combined 2`
$ ethtool --show-channels ens3f0
Channel parameters for ens3f0:
Pre-set maximums:
RX: 64
TX: 64
Other: 1
Combined: 64
Current hardware settings:
RX: 0
TX: 0
Other: 1
Combined: 2The stage only needs to be run once after boot but before running Firedancer. It has no dependencies on any other stage, although it is dependent on the number of net tiles in your configuration.
Changing device settings with ethtool-channels requires root privileges, and cannot be performed with capabilities.
ethtool-offloads
XDP is incompatible with a feature of network devices called generic-receive-offload. This feature must be disabled for Firedancer to work. GRE segmentation offload is also disabled.
The command run by the stage is similar to running ethtool --offload <device> <offload> off but it also supports bonded devices. We can check that it worked:
$ sudo fdctl configure init ethtool-offloads
NOTICE ethtool-offloads ... unconfigured ... device `ens3f0np0` has generic-receive-offload enabled. Should be disabled
NOTICE ethtool-offloads ... configuring
NOTICE RUN: `ethtool --offload ens3f0np0 generic-receive-offload off`
NOTICE RUN: `ethtool --features ens3f0np0 tx-gre-segmentation off`
NOTICE RUN: `ethtool --offload lo generic-receive-offload off`
$ ethtool --show-offload ens3f0np0 | grep generic-receive-offload
generic-receive-offload: off
$ ethtool --show-offload ens3f0np0 | grep tx-gre-segmentation
tx-gre-segmentation: offThe stage only needs to be run once after boot but before running Firedancer. It has no dependencies on any other stage.
Changing device settings with ethtool-offloads requires root privileges, and cannot be performed with capabilities.
ethtool-loopback
XDP is incompatible with localhost UDP traffic using a feature called tx-udp-segmentation. This feature must be disabled when connecting Agave clients to Firedancer over loopback, or when using Frankendancer.
The command run by the stage is ethtool --offload lo tx-udp-segmentation off. We can check that it worked:
$ sudo fdctl configure init ethtool-loopback
NOTICE ethtool-loopback ... unconfigured ... device `lo` has tx-udp-segmentation enabled. Should be disabled
NOTICE ethtool-loopback ... configuring
NOTICE ethtool-loopback ... RUN: `ethtool --offload lo tx-udp-segmentation off`
$ ethtool --show-offload lo | grep tx-udp-segmentation
tx-udp-segmentation: offThe stage only needs to be run once after boot but before running Firedancer. It has no dependencies on any other stage.
Changing device settings with ethtool-loopback requires root privileges, and cannot be performed with capabilities.
snapshots
When starting up, validators must load a snapshot to catch up to the current state of the blockchain. Snapshots are downloaded from other validator peers in the cluster and are stored to a snapshots directory.
In init, the snapshots configure phase will create the snapshots directory if it does not exist. In fini, the snapshots configure phase will remove the snapshots directory recursively.
NOTE
The snapshots configure phase is only enabled in the Firedancer binary.