doc/development.md - platform/external/n2 - Git at Google

 ## Git hook

 On a new checkout, run this to install the formatting check hook:

 ```
 $ ln -s ../../git-pre-commit .git/hooks/pre-commit
 ```

 ## Path handling and Unicode safety

 See the longer discussion of Unicode in general in the
 [design notes](design_notes.md).

 Concretely, we currently use Rust `String` for all paths and file contents, but
 internally interpret them as as bytes (not UTF-8) including using `unsafe`
 sometimes to convert.

 Based on my superficial understanding of how safety relates to UTF-8 in Rust
 strings, it's probably harmless given that we never treat strings as Unicode,
 but it's also possible some code outside of our control relies on this. But it
 does mean there's a bunch of kind of needless `unsafe`s in the code, and some of
 them are possibly actually doing something bad.

 We could fix this by switching to using a bag of bytes type, like
 https://crates.io/crates/bstr. But it is pretty invasive. We would need to use
 that not only for paths but also console output, error messages, etc. And it's
 not clear (again, see above design notes discussion) that using bags of bytes is
 the desired end state, so it's probably not worth doing.

 ## Profiling

 ### gperftools

 I played with a few profilers, but I think the gperftools profiler turned out to
 be significantly better than the others. To install:

 ```
 $ apt install libgoogle-perftools-dev
 $ go install github.com/google/pprof@latest
 ```

 To use:

 ```
 [possibly modify main.rs to make the app do more work than normal]
 $ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libprofiler.so CPUPROFILE=p ./target/release/n2 ...
 $ pprof -http=:8080 ./target/release/n2 p
 ```

 The web server it brings up shows an interactive graph, top functions, annotated
 code, disassembly...

 ### Mac

 ```
 $ cargo instruments --release --template time --bin n2 -- -C ~/projects/llvm-project-16.0.0.src/build clang-format
 ```

 TODO: notes on this vs `cargo flamegraph`.

 ### Windows

 It appears `perf` profiling of Rust under WSL2 is not a thing(?).

 ## Benchmarking

 ### Simple

 This benchmarks end-to-end n2 load time, by asking to build a nonexistent target:

 1. `cargo install hyperfine`
 2. `$ hyperfine -i -- './target/release/n2 -C ~/llvm-project/llvm/utils/gn/out/ xxx'`

 ### Micro

 There are microbenchmarks in the `benches/` directory.  Run them with:

 ```
 $ cargo bench
 ```

 If there is a `build.ninja` in the `benches/` directory, the parsing benchmark
 will load it.  For example, you can copy the `build.ninja` generated by the
 LLVM CMake build system (66mb on my system!).

 To run just the parsing benchmark:

 ```
 $ cargo bench --bench parse -- parse
 ```

 When iterating on benchmarks, it can help build time to disable `lto` in release
 mode by commenting out the `lto =` line in `Cargo.toml`.  (On my system, `lto`
 is worth ~13% of parsing performance.)
	## Git hook

	On a new checkout, run this to install the formatting check hook:

	```
	$ ln -s ../../git-pre-commit .git/hooks/pre-commit
	```

	## Path handling and Unicode safety

	See the longer discussion of Unicode in general in the
	[design notes](design_notes.md).

	Concretely, we currently use Rust `String` for all paths and file contents, but
	internally interpret them as as bytes (not UTF-8) including using `unsafe`
	sometimes to convert.

	Based on my superficial understanding of how safety relates to UTF-8 in Rust
	strings, it's probably harmless given that we never treat strings as Unicode,
	but it's also possible some code outside of our control relies on this. But it
	does mean there's a bunch of kind of needless `unsafe`s in the code, and some of
	them are possibly actually doing something bad.

	We could fix this by switching to using a bag of bytes type, like
	https://crates.io/crates/bstr. But it is pretty invasive. We would need to use
	that not only for paths but also console output, error messages, etc. And it's
	not clear (again, see above design notes discussion) that using bags of bytes is
	the desired end state, so it's probably not worth doing.

	## Profiling

	### gperftools

	I played with a few profilers, but I think the gperftools profiler turned out to
	be significantly better than the others. To install:

	```
	$ apt install libgoogle-perftools-dev
	$ go install github.com/google/pprof@latest
	```

	To use:

	```
	[possibly modify main.rs to make the app do more work than normal]
	$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libprofiler.so CPUPROFILE=p ./target/release/n2 ...
	$ pprof -http=:8080 ./target/release/n2 p
	```

	The web server it brings up shows an interactive graph, top functions, annotated
	code, disassembly...

	### Mac

	```
	$ cargo instruments --release --template time --bin n2 -- -C ~/projects/llvm-project-16.0.0.src/build clang-format
	```

	TODO: notes on this vs `cargo flamegraph`.

	### Windows

	It appears `perf` profiling of Rust under WSL2 is not a thing(?).

	## Benchmarking

	### Simple

	This benchmarks end-to-end n2 load time, by asking to build a nonexistent target:

	1. `cargo install hyperfine`
	2. `$ hyperfine -i -- './target/release/n2 -C ~/llvm-project/llvm/utils/gn/out/ xxx'`

	### Micro

	There are microbenchmarks in the `benches/` directory. Run them with:

	```
	$ cargo bench
	```

	If there is a `build.ninja` in the `benches/` directory, the parsing benchmark
	will load it. For example, you can copy the `build.ninja` generated by the
	LLVM CMake build system (66mb on my system!).

	To run just the parsing benchmark:

	```
	$ cargo bench --bench parse -- parse
	```

	When iterating on benchmarks, it can help build time to disable `lto` in release
	mode by commenting out the `lto =` line in `Cargo.toml`. (On my system, `lto`
	is worth ~13% of parsing performance.)