| <a id="top"></a> | 
 | # Authoring benchmarks | 
 |  | 
 | > [Introduced](https://github.com/catchorg/Catch2/issues/1616) in Catch 2.9.0. | 
 |  | 
 | _Note that benchmarking support is disabled by default and to enable it, | 
 | you need to define `CATCH_CONFIG_ENABLE_BENCHMARKING`. For more details, | 
 | see the [compile-time configuration documentation](configuration.md#top)._ | 
 |  | 
 | Writing benchmarks is not easy. Catch simplifies certain aspects but you'll | 
 | always need to take care about various aspects. Understanding a few things about | 
 | the way Catch runs your code will be very helpful when writing your benchmarks. | 
 |  | 
 | First off, let's go over some terminology that will be used throughout this | 
 | guide. | 
 |  | 
 | - *User code*: user code is the code that the user provides to be measured. | 
 | - *Run*: one run is one execution of the user code. | 
 | - *Sample*: one sample is one data point obtained by measuring the time it takes | 
 |   to perform a certain number of runs. One sample can consist of more than one | 
 |   run if the clock available does not have enough resolution to accurately | 
 |   measure a single run. All samples for a given benchmark execution are obtained | 
 |   with the same number of runs. | 
 |  | 
 | ## Execution procedure | 
 |  | 
 | Now I can explain how a benchmark is executed in Catch. There are three main | 
 | steps, though the first does not need to be repeated for every benchmark. | 
 |  | 
 | 1. *Environmental probe*: before any benchmarks can be executed, the clock's | 
 | resolution is estimated. A few other environmental artifacts are also estimated | 
 | at this point, like the cost of calling the clock function, but they almost | 
 | never have any impact in the results. | 
 |  | 
 | 2. *Estimation*: the user code is executed a few times to obtain an estimate of | 
 | the amount of runs that should be in each sample. This also has the potential | 
 | effect of bringing relevant code and data into the caches before the actual | 
 | measurement starts. | 
 |  | 
 | 3. *Measurement*: all the samples are collected sequentially by performing the | 
 | number of runs estimated in the previous step for each sample. | 
 |  | 
 | This already gives us one important rule for writing benchmarks for Catch: the | 
 | benchmarks must be repeatable. The user code will be executed several times, and | 
 | the number of times it will be executed during the estimation step cannot be | 
 | known beforehand since it depends on the time it takes to execute the code. | 
 | User code that cannot be executed repeatedly will lead to bogus results or | 
 | crashes. | 
 |  | 
 | ## Benchmark specification | 
 |  | 
 | Benchmarks can be specified anywhere inside a Catch test case. | 
 | There is a simple and a slightly more advanced version of the `BENCHMARK` macro. | 
 |  | 
 | Let's have a look how a naive Fibonacci implementation could be benchmarked: | 
 | ```c++ | 
 | std::uint64_t Fibonacci(std::uint64_t number) { | 
 |     return number < 2 ? 1 : Fibonacci(number - 1) + Fibonacci(number - 2); | 
 | } | 
 | ``` | 
 | Now the most straight forward way to benchmark this function, is just adding a `BENCHMARK` macro to our test case: | 
 | ```c++ | 
 | TEST_CASE("Fibonacci") { | 
 |     CHECK(Fibonacci(0) == 1); | 
 |     // some more asserts.. | 
 |     CHECK(Fibonacci(5) == 8); | 
 |     // some more asserts.. | 
 |  | 
 |     // now let's benchmark: | 
 |     BENCHMARK("Fibonacci 20") { | 
 |         return Fibonacci(20); | 
 |     }; | 
 |  | 
 |     BENCHMARK("Fibonacci 25") { | 
 |         return Fibonacci(25); | 
 |     }; | 
 |  | 
 |     BENCHMARK("Fibonacci 30") { | 
 |         return Fibonacci(30); | 
 |     }; | 
 |  | 
 |     BENCHMARK("Fibonacci 35") { | 
 |         return Fibonacci(35); | 
 |     }; | 
 | } | 
 | ``` | 
 | There's a few things to note: | 
 | - As `BENCHMARK` expands to a lambda expression it is necessary to add a semicolon after | 
 |  the closing brace (as opposed to the first experimental version). | 
 | - The `return` is a handy way to avoid the compiler optimizing away the benchmark code. | 
 |  | 
 | Running this already runs the benchmarks and outputs something similar to: | 
 | ``` | 
 | ------------------------------------------------------------------------------- | 
 | Fibonacci | 
 | ------------------------------------------------------------------------------- | 
 | C:\path\to\Catch2\Benchmark.tests.cpp(10) | 
 | ............................................................................... | 
 | benchmark name                                  samples       iterations    estimated | 
 |                                                 mean          low mean      high mean | 
 |                                                 std dev       low std dev   high std dev | 
 | ------------------------------------------------------------------------------- | 
 | Fibonacci 20                                            100       416439   83.2878 ms | 
 |                                                        2 ns         2 ns         2 ns | 
 |                                                        0 ns         0 ns         0 ns | 
 |  | 
 | Fibonacci 25                                            100       400776   80.1552 ms | 
 |                                                        3 ns         3 ns         3 ns | 
 |                                                        0 ns         0 ns         0 ns | 
 |  | 
 | Fibonacci 30                                            100       396873   79.3746 ms | 
 |                                                       17 ns        17 ns        17 ns | 
 |                                                        0 ns         0 ns         0 ns | 
 |  | 
 | Fibonacci 35                                            100       145169   87.1014 ms | 
 |                                                      468 ns       464 ns       473 ns | 
 |                                                       21 ns        15 ns        34 ns | 
 | ``` | 
 |  | 
 | ### Advanced benchmarking | 
 | The simplest use case shown above, takes no arguments and just runs the user code that needs to be measured. | 
 | However, if using the `BENCHMARK_ADVANCED` macro and adding a `Catch::Benchmark::Chronometer` argument after | 
 | the macro, some advanced features are available. The contents of the simple benchmarks are invoked once per run, | 
 | while the blocks of the advanced benchmarks are invoked exactly twice: | 
 | once during the estimation phase, and another time during the execution phase. | 
 |  | 
 | ```c++ | 
 | BENCHMARK("simple"){ return long_computation(); }; | 
 |  | 
 | BENCHMARK_ADVANCED("advanced")(Catch::Benchmark::Chronometer meter) { | 
 |     set_up(); | 
 |     meter.measure([] { return long_computation(); }); | 
 | }; | 
 | ``` | 
 |  | 
 | These advanced benchmarks no longer consist entirely of user code to be measured. | 
 | In these cases, the code to be measured is provided via the | 
 | `Catch::Benchmark::Chronometer::measure` member function. This allows you to set up any | 
 | kind of state that might be required for the benchmark but is not to be included | 
 | in the measurements, like making a vector of random integers to feed to a | 
 | sorting algorithm. | 
 |  | 
 | A single call to `Catch::Benchmark::Chronometer::measure` performs the actual measurements | 
 | by invoking the callable object passed in as many times as necessary. Anything | 
 | that needs to be done outside the measurement can be done outside the call to | 
 | `measure`. | 
 |  | 
 | The callable object passed in to `measure` can optionally accept an `int` | 
 | parameter. | 
 |  | 
 | ```c++ | 
 | meter.measure([](int i) { return long_computation(i); }); | 
 | ``` | 
 |  | 
 | If it accepts an `int` parameter, the sequence number of each run will be passed | 
 | in, starting with 0. This is useful if you want to measure some mutating code, | 
 | for example. The number of runs can be known beforehand by calling | 
 | `Catch::Benchmark::Chronometer::runs`; with this one can set up a different instance to be | 
 | mutated by each run. | 
 |  | 
 | ```c++ | 
 | std::vector<std::string> v(meter.runs()); | 
 | std::fill(v.begin(), v.end(), test_string()); | 
 | meter.measure([&v](int i) { in_place_escape(v[i]); }); | 
 | ``` | 
 |  | 
 | Note that it is not possible to simply use the same instance for different runs | 
 | and resetting it between each run since that would pollute the measurements with | 
 | the resetting code. | 
 |  | 
 | It is also possible to just provide an argument name to the simple `BENCHMARK` macro to get | 
 | the same semantics as providing a callable to `meter.measure` with `int` argument: | 
 |  | 
 | ```c++ | 
 | BENCHMARK("indexed", i){ return long_computation(i); }; | 
 | ``` | 
 |  | 
 | ### Constructors and destructors | 
 |  | 
 | All of these tools give you a lot mileage, but there are two things that still | 
 | need special handling: constructors and destructors. The problem is that if you | 
 | use automatic objects they get destroyed by the end of the scope, so you end up | 
 | measuring the time for construction and destruction together. And if you use | 
 | dynamic allocation instead, you end up including the time to allocate memory in | 
 | the measurements. | 
 |  | 
 | To solve this conundrum, Catch provides class templates that let you manually | 
 | construct and destroy objects without dynamic allocation and in a way that lets | 
 | you measure construction and destruction separately. | 
 |  | 
 | ```c++ | 
 | BENCHMARK_ADVANCED("construct")(Catch::Benchmark::Chronometer meter) { | 
 |     std::vector<Catch::Benchmark::storage_for<std::string>> storage(meter.runs()); | 
 |     meter.measure([&](int i) { storage[i].construct("thing"); }); | 
 | }; | 
 |  | 
 | BENCHMARK_ADVANCED("destroy")(Catch::Benchmark::Chronometer meter) { | 
 |     std::vector<Catch::Benchmark::destructable_object<std::string>> storage(meter.runs()); | 
 |     for(auto&& o : storage) | 
 |         o.construct("thing"); | 
 |     meter.measure([&](int i) { storage[i].destruct(); }); | 
 | }; | 
 | ``` | 
 |  | 
 | `Catch::Benchmark::storage_for<T>` objects are just pieces of raw storage suitable for `T` | 
 | objects. You can use the `Catch::Benchmark::storage_for::construct` member function to call a constructor and | 
 | create an object in that storage. So if you want to measure the time it takes | 
 | for a certain constructor to run, you can just measure the time it takes to run | 
 | this function. | 
 |  | 
 | When the lifetime of a `Catch::Benchmark::storage_for<T>` object ends, if an actual object was | 
 | constructed there it will be automatically destroyed, so nothing leaks. | 
 |  | 
 | If you want to measure a destructor, though, we need to use | 
 | `Catch::Benchmark::destructable_object<T>`. These objects are similar to | 
 | `Catch::Benchmark::storage_for<T>` in that construction of the `T` object is manual, but | 
 | it does not destroy anything automatically. Instead, you are required to call | 
 | the `Catch::Benchmark::destructable_object::destruct` member function, which is what you | 
 | can use to measure the destruction time. | 
 |  | 
 | ### The optimizer | 
 |  | 
 | Sometimes the optimizer will optimize away the very code that you want to | 
 | measure. There are several ways to use results that will prevent the optimiser | 
 | from removing them. You can use the `volatile` keyword, or you can output the | 
 | value to standard output or to a file, both of which force the program to | 
 | actually generate the value somehow. | 
 |  | 
 | Catch adds a third option. The values returned by any function provided as user | 
 | code are guaranteed to be evaluated and not optimised out. This means that if | 
 | your user code consists of computing a certain value, you don't need to bother | 
 | with using `volatile` or forcing output. Just `return` it from the function. | 
 | That helps with keeping the code in a natural fashion. | 
 |  | 
 | Here's an example: | 
 |  | 
 | ```c++ | 
 | // may measure nothing at all by skipping the long calculation since its | 
 | // result is not used | 
 | BENCHMARK("no return"){ long_calculation(); }; | 
 |  | 
 | // the result of long_calculation() is guaranteed to be computed somehow | 
 | BENCHMARK("with return"){ return long_calculation(); }; | 
 | ``` | 
 |  | 
 | However, there's no other form of control over the optimizer whatsoever. It is | 
 | up to you to write a benchmark that actually measures what you want and doesn't | 
 | just measure the time to do a whole bunch of nothing. | 
 |  | 
 | To sum up, there are two simple rules: whatever you would do in handwritten code | 
 | to control optimization still works in Catch; and Catch makes return values | 
 | from user code into observable effects that can't be optimized away. | 
 |  | 
 | <i>Adapted from nonius' documentation.</i> |