Performance
Rust often attracts people that care about performance. Often times, performance is not the end goal: instead, higher performance means higher efficiency. In an era of cloud computing, this translates to lower costs per request.
Performance optimizations are a large subject, and this book will not go into depth when it comes to it. There are other books that do a better job of summarizing what can be done to optimize applications, such as the Rust Performance Book. But this book does make a point that performance is something that should be tested and tracked over time, that is the only way to ensure that a project is heading in the right direction and not regressing.
The way you can do that in Rust is by writing benchmarks. In fact, Cargo comes with built-in support for doing so. While the Cargo build-in benchmarking harness is still unstable, there are some crates that allow you to easily build benchmarks for both blocking and async code, and track their performance over time.
Writing benchmarks makes it easy to experiement with different options of implementing a feature, because it makes it easy to compare the performance differences between various approaches. Another application is tracking the performance of your code over time, by running benchmarks on every commit or periodically by a platform such as Bencher or the Continuous Benchmark GitHub Action.
Often times, performance is a tradeoff. While Rust has some zero-cost abstrations that allow you to write simple code that is still fast, there are many situations where you have to make a choice between a simpler implementation or some tech debt, and doing it properly, resulting in more development time or more complex code. The only way to make these decisions properly is to have data for them. How much runtime performance are you trading by keeping your simple implementation? How much performance are you gaining by having a more complex implementation? Projects should make these decisions based on measurements, and not guesses.
Criterion
Typically, the way that you write these is using the criterion crate[^1]. This lets you test both synchronous and asynchronous code, and it provides some support for statistical analysis of the benchmark results. The Rust standard library also has some benchmarking support, but this is currently a nightly-only feature.
Examples
TODO:
- simple benchmarking with criterion
- async benchmarking with criterion
- benchmarking published to bencher
Valgrind
- idea: repeatable measurements (on same architecture).
Flamegraph
Debugging Performance
So, what do you do if you notice that your Rust code is not performing well? There are some common issues you might run into:
- Build mode: Are you building your code in release mode (eg.
cargo build --release
)? It makes a large difference for Rust projects. - Optimization level: Have you changed the optimization level, for example to optimize for size rather than speed? This can also make a large difference.
- Link-time optimization: Have you tried enabling
lto
in your compilation profile? - Build target: Are you building for musl libc instead of glibc (eg.
--target x86_64-unknown-linux-musl
)? Musl tends to produce slower code. - Allocator: Is your application allocation-heavy? Then try using
jemallocator
, it might give you a performance boost. - Data structures: Have you tried using different data structures? For
example, the hashbrown crate
has a
HashMap
implementation that is significantly faster than the standard library.
If these didn’t fix your performance issues, the next step to do is to find out why your performance isn’t good. When it comes to improving performance, the best thing to do is to be guided by data rather than intuition. There are many microoptimizations you can do in your code that lead to negligible benefits. Letting yourself be guided by data allows you to focus on the most important optimizations, this is known as Amdahl’s law.
Visualizing Performance
To get an understanding of where you are losing performance, you want to get some insight into which code in your program is responsible for the majority of the runtime. Doing this guides you to where you should focus your attention towards when trying optimization approaches.
cargo-flamegraph is a Cargo subcommand that lets you visualize what code in your project is taking up the majority of the runtime.
Reading
This book summarizes various approaches of benchmarking and profiling code, and offers some suggestions to use to improve performance.
The Criterion Book explains how to get started using Criterion, and what features it has.
Benchmark It! by Ryan James Spencer
Ryan argues in this blog post that you should benchmark code. He said that users can feel performance and you should care about it. He explains how to get started doing performance benchmarkis in Rust using criterion.
This blog post from Bencher explains the concept of continuous benchmarking. It also talks about some myths surrounding benchmarking, for example benchmarking in CI.
Continuous benchmarking for rustls by Adolfo Ochagavía
Adolfo explains in this blog post how he was able to implement continuous
benchmarking for the rustls library, and how he was able to leverage this
to find performance regressions easily. He explains that using cachegrind
was
instrumental, because it is able to count CPU instructions and easily diff them
per function for different benchmark runs, which allows for tracking down which
function introduced a regression.
Making slow Rust code fast by Patrick Freed
Guidelines on Benchmarking and Rust
Benchmarking and analyzing Rust performance with Criterion and iai
Benchmarking Rust code using Criterion-rs
Windtunnel CI
https://lib.rs/crates/iai-callgrind
https://github.com/bheisler/iai
https://www.magiroux.com/rust-jemalloc-profiling/
https://kobzol.github.io/rust/rustc/2023/08/18/rustc-benchmark-suite.html
https://blog.anp.lol/rust/2016/07/24/profiling-rust-perf-flamegraph/
Benchmarking in The Rust Performance Book