Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance

When you build your code, you have some opportunities to tweak how Cargo compiles your code that can result in better performance.

By default, Cargo has two built-in profiles: dev and release. These come with sensible defaults, but you can override them, or even create your own, custom profiles.

[profile.release]
opt-level = 3
debug = false
split-debuginfo = '...'  # Platform-specific.
strip = "none"
debug-assertions = false
overflow-checks = false
lto = false
panic = 'unwind'
incremental = false
codegen-units = 16
rpath = false

When you enable Link-Time Optimizations (LTO), you ask the compiler to run some extra optimization passes not when building the individual crates, but when linking your crates together into a binary. At this point, the compiler can see exactly which code is actually getting called, and which is not.

Link-Time Optimizations therefore allow you to reduce the code size, which can result in added speed. They also allow for inlining things across crate boundaries, which can also give your code a speed boost.

To enable LTO, you can set the lto property in your profile to true or to "full".

[profile.release]
lto = "full"

Enabling Target Features

When you compile your Cargo crate, it will generate code for some specific platform. Typically, you will generate code for the x86_64-unknown-linux-glibc target. That first part of the triple, x86_64 (commonly called amd64) is the type of processor that your code will run.

Modern AMD64 processors have an array of extensions that can speed up certain operations, such as hardware support for AES throught AES-NI, or support for SIMD with AVX2. In order for your program to remain compatible with many processors, Cargo will, by default, not make use of these added instructions, unless you tell it to.

You can enable these extra instructions (called target features) by adding it into your Cargo configuration at .cargo/config.toml within your repository.

[target.x86_64-unknown-linux-gnu]
rustflags = ["-C target-feature=+avx2"]

Note that these flags only affect which instructions Cargo will natively emit. Crates may check for the presence of these features. But some crates will also detect them at runtime and switch to whichever implementation works best on your chipset.

Performance-Guided Optimization

Performance-Guided Optimization (PGO) is an approach to give the compiler better context for optimizing your program, by first compiling it with instrumentation, running representative workloads (with the instrumentation tracking which branches are taken, and which functions are commonly used), and then re-compiling your program with this information.

If the compiler knows which branches are commonly taken, and which functions are commonly used, it is sometimes able to emit code that runs faster.

Post-Link Optimization

Post-Link Optimization is an approach whereby binaries are optimized after being fully compiled and linked.

Using a different allocator

In programs that perform a lot of allocations (which is most programs these days), the allocator can be a bottleneck for performance.

Reading

https://doc.rust-lang.org/cargo/reference/profiles.html

https://kobzol.github.io/rust/cargo/2023/07/28/rust-cargo-pgo.html

https://doc.rust-lang.org/rustc/profile-guided-optimization.html

https://blog.rust-lang.org/inside-rust/2020/11/11/exploring-pgo-for-the-rust-compiler.html

https://github.com/Kobzol/cargo-pgo

https://github.com/llvm/llvm-project/tree/main/bolt

https://rustc-dev-guide.rust-lang.org/building/optimized-build.html