Initializing
Overview
The fdctl configure
command is used to setup the host operator system so Firedancer can run correctly. It does the following:
- hugetlbfs Reserves huge and gigantic pages for use by Firedancer.
- sysctl Sets required kernel parameters.
- hyperthreads Disables hyperthreaded pair for critical CPU cores.
- ethtool-channels Configures the number of channels on the network device.
- ethtool-gro Disable generic-receive-offload (GRO) on the network device.
- ethtool-loopback Disable tx-udp-segmentation on the loopback device.
The hugetlbfs
configuration must be performed every time the system is rebooted, to remount the hugetlbfs
filesystems, as do sysctl
, ethtool-channels
and ethtool-gro
to reconfigure the networking device, and hyperthreads
to configure CPU cores.
The configure command is run like fdctl configure <mode> <stage>...
where mode
is one of:
init
Configures the provided stages if they are not already configured.check
Check if each stage is already configured. The command will exit with an error code if they are not.check
never requires privileges and will not make any changes to the system.fini
Unconfigure (reverse) the stage if it is reversible.
stage
can be one or more of hugetlbfs
, sysctl
, hyperthreads
, ethtool-channels
, ethtool-gro
, ethtool-loopback
and these stages are described below. You can also use the stage all
which will configure everything.
Stages have different privilege requirements, which you can see by trying to run the stage without privileges. The check
mode never requires privileges, and the init
mode will only require privileges if it needs to actually change something.
hugetlbfs
The hugetlbfs
stage is used to reserve huge
(2MiB) and gigantic
(1GiB) memory pages from the Linux kernel for use by Firedancer. See also the kernel documentation of these pages. Almost all memory in Firedancer is allocated out of these pages for performance reasons.
This is a two step process. First, the number of huge
and gigantic
pages available on the entire system is increased in the kernel by increasing /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
until the free_hugepages
value is high enough for all the memory needs of the validator.
Once the pages have been reserved globally in the kernel pool, they are assigned specifically to Firedancer by creating a hugetlbfs
mount at each of /mnt/.fd/.gigantic/
and /mnt/.fd/.huge
for gigantic and huge pages respectively. These paths can be configured in the TOML file under the [hugetlbfs]
section. Lets run it:
$ sudo fdctl configure init hugetlbfs
NOTICE hugetlbfs ... unconfigured ... mounts `/mnt/.fd/.huge` and `/mnt/.fd/.gigantic` do not exist
NOTICE hugetlbfs ... configuring
NOTICE RUN: `mkdir -p /mnt/.fd/.huge`
NOTICE RUN: `mount -t hugetlbfs none /mnt/.fd/.huge -o pagesize=2097152,min_size=228589568`
NOTICE RUN: `mkdir -p /mnt/.fd/.gigantic`
NOTICE RUN: `mount -t hugetlbfs none /mnt/.fd/.gigantic -o pagesize=1073741824,min_size=27917287424`
$ cat /proc/mounts | grep \\.fd
none /mnt/.fd/.gigantic hugetlbfs rw,seclabel,relatime,pagesize=1024M,min_size=540092137472 0 0
none /mnt/.fd/.huge hugetlbfs rw,seclabel,relatime,pagesize=2M,min_size=95124124 0 0
This stage requires root privileges, and cannot be performed with capabilities. If the required hugetlbfs mounts are already present, with at least the amount of memory reserved that we required then the init
mode does nothing and the check
mode will return successfully without requiring privileges.
The fini
mode will unmount the two filesystems, and remove them from /mnt/.fd/
, although it will leave the /mnt/.fd/
directory in place. The fini
mode will not succeed if memory from the mounts is mapped into a running process.
If fini succeeds, the huge and gigantic pages that Firedancer had reserved will be returned to the kernel global pool so they can be used by other programs, but the global pool size will not be decreased, even if it was earlier increased during init
.
TIP
The hugetlbfs
step should be run immediately when the system is booted. If run later, it may fail because the operating system memory is fragmented and a large contiguous block cannot be reserved.
sysctl
It is suggested to run Firedancer with certain kernel parameters tuned for best performance. The sysctl
stage will check and configure these parameters. The stage will only increase values to meet the minimum, and will not decrease them if the minimum is already met.
Sysctl | Minimum | Required | Description |
---|---|---|---|
/proc/sys/vm/max_map_count | 1000000 | Yes | Agave accounts database requires mapping many files. |
/proc/sys/fs/file-max | 1024000 | Yes | Agave accounts database requires opening many files. |
/proc/sys/fs/nr_open | 1024000 | Yes | Agave accounts database requires opening many files. |
/proc/sys/net/ipv4/conf/lo/rp_filter | 2 | Yes | If sending QUIC transactions to Firedancer over loopback, this must be enabled to receive a response. Otherwise Linux will drop response packets due to limitations in the kernel eBPF networking stack. The sendTransaction RPC call will send over loopback. |
/proc/sys/net/ipv4/conf/lo/accept_local | 1 | Yes | If sending QUIC transactions to Firedancer over loopback, this must be enabled to receive a response. Otherwise Linux will drop response packets due to limitations in the kernel eBPF networking stack. The sendTransaction RPC call will send over loopback. |
/proc/sys/net/core/bpf_jit_enable | 1 | No | Firedancer uses BPF for kernel bypass networking. BPF JIT makes this faster. |
/proc/sys/kernel/numa_balancing | 0 | No | Firedancer assigns all memory to the right NUMA node, and rebalancing will make the system slower. |
Sysctls that are not required will produce a warning if they are not set correctly, but configuration will proceed and exit normally.
The init
mode requires either root
privileges, or to be run with CAP_SYS_ADMIN
. The fini
mode does nothing and kernel parameters will never be reduced or changed back as a result of running configure
.
hyperthreads
Most work in Firedancer can be scaled with the number of CPU cores, but there are two jobs (tiles) which must run serially on a single core:
- pack Responsible for scheduling transactions for execution when we are leader.
- poh Performs repeated
sha256
hashes, and periodically stamps these hashes into in-progress blocks when we are leader.
Because any interruption, context switch, or sharing of the CPU core that these jobs run on could cause skipped leader slots or unfull blocks, Firedancer expects them to get a dedicated core. This means on machines with a hyperthreaded CPU, the hyperthreaded pair of these tiles should be switched to offline.
This stage looks to see if the CPU is hyperthreaded, and will switch the pair of these tiles to offline
. All other CPU cores, if offline
will be switched back to online
.
The specific command run by the stage is toggling values in /sys/devices/system/cpu/cpu<id>/online
between 0
and 1
. We can run the command with a typical auto layout to see:
$ sudo fdctl configure init hyperthreads
NOTICE hyperthreads ... unconfigured ... pack cpu 5 has hyperthread pair 37 which should be offline
NOTICE hyperthreads ... configuring
NOTICE RUN: `echo "0" > /sys/devices/system/cpu/cpu37/online`
NOTICE RUN: `echo "0" > /sys/devices/system/cpu/cpu40/online`
$ cat /sys/devices/system/cpu/cpu37/online
0
When using the auto
layout, Firedancer will ensure no other tiles are assigned to run on the hyperthread pairs, but if using a manual layout, it is possible to assign another tile to the pair, in which case configuration will succeed without turning the pair off.
The stage only needs to be run once after boot but before running Firedancer. It has no dependencies on any other stage, although it is dependent on the topology specified in your configuration.
Changing CPUs to offline or online requires root privileges, and cannot be performed with capabilities.
The fini
mode will switch all CPUs back to online.
ethtool-channels
In addition to XDP, Firedancer uses receive side scaling (RSS) to improve network performance. This uses functionality of modern NICs to steer packets to different queues to distribute processing among CPUs. See the kernel documentation for more information.
In Firedancer, each net
tile serves one network queue, so the ethtool-channels
stage will modify the combined channel count of the configured network device [tiles.net.interface]
to be the same as the number of net
tiles, [layout.net_tile_count]
. If your NIC does not support the required number of queues, you will need to reduce the number of net
tiles, potentially down to one for NICs which don't support queues at all.
The command run by the stage is similar to running ethtool --set-channels <device> combined <N>
but it also supports bonded devices. We can check that it worked:
$ sudo fdctl configure init ethtool-channels
NOTICE ethtool-channels ... unconfigured ... device `ens3f0` does not have right number of channels (got 1 but expected 2)
NOTICE ethtool-channels ... configuring
NOTICE ethtool-channels ... RUN: `ethtool --set-channels ens3f0 combined 2`
$ ethtool --show-channels ens3f0
Channel parameters for ens3f0:
Pre-set maximums:
RX: 64
TX: 64
Other: 1
Combined: 64
Current hardware settings:
RX: 0
TX: 0
Other: 1
Combined: 2
The stage only needs to be run once after boot but before running Firedancer. It has no dependencies on any other stage, although it is dependent on the number of net
tiles in your configuration.
Changing device settings with ethtool-channels
requires root privileges, and cannot be performed with capabilities.
ethtool-gro
XDP is incompatible with a feature of network devices called generic-receive-offload
. This feature must be disabled for Firedancer to work.
The command run by the stage is similar to running ethtool --offload <device> generic-receive-offload off
but it also supports bonded devices. We can check that it worked:
$ sudo fdctl configure init ethtool-gro
NOTICE ethtool-gro ... unconfigured ... device `ens3f0` has generic-receive-offload enabled. Should be disabled
NOTICE ethtool-gro ... configuring
NOTICE ethtool-gro ... RUN: `ethtool --offload ens3f0 generic-receive-offload off`
$ ethtool --show-offload ens3f0 | grep generic-receive-offload
generic-receive-offload: off
The stage only needs to be run once after boot but before running Firedancer. It has no dependencies on any other stage.
Changing device settings with ethtool-gro
requires root privileges, and cannot be performed with capabilities.
ethtool-loopback
XDP is incompatible with localhost UDP traffic using a feature called tx-udp-segmentation
. This feature must be disabled when connecting Agave clients to Firedancer over loopback, or when using Frankendancer.
The command run by the stage is ethtool --offload lo tx-udp-segmentation off
. We can check that it worked:
$ sudo fdctl configure init ethtool-loopback
NOTICE ethtool-loopback ... unconfigured ... device `lo` has tx-udp-segmentation enabled. Should be disabled
NOTICE ethtool-loopback ... configuring
NOTICE ethtool-loopback ... RUN: `ethtool --offload lo tx-udp-segmentation off`
$ ethtool --show-offload lo | grep tx-udp-segmentation
tx-udp-segmentation: off
The stage only needs to be run once after boot but before running Firedancer. It has no dependencies on any other stage.
Changing device settings with ethtool-loopback
requires root privileges, and cannot be performed with capabilities.