tools/canoninja/README.md - platform/build - Git at Google

 # Ninja File Canonicalizer

 Suppose we have a tool that generates a Ninja file from some other description (think Kati and makefiles), and during
 the testing we discovered a regression. Furthermore, suppose that the generated Ninja file is large (think millions of
 lines). And, the new Ninja file has build statements and rules in a slightly different order. As the tool generates the
 rule names, the real differences in the output of the `diff` command are drowned in noise. Enter Canoninja.

 Canoninja renames each Ninja rule to the hash of its contents. After that, we can just sort the build statements, and a
 simple `comm` command immediately reveal the essential difference between the files.

 ## Example

 Consider the following makefile

 ```makefile
 second :=
 first: foo
 foo:
 	@echo foo
 second: bar
 bar:
 	@echo bar
 ```

 Depending on Kati version converting it to Ninja file will yield either:

 ```
 $ cat /tmp/1.ninja
 # Generated by kati 06f2569b2d16628608c000a76e3d495a5a5528cb

 pool local_pool
  depth = 72

 build _kati_always_build_: phony

 build first: phony foo
 rule rule0
  description = build $out
  command = /bin/sh -c "echo foo"
 build foo: rule0
 build second: phony bar
 rule rule1
  description = build $out
  command = /bin/sh -c "echo bar"
 build bar: rule1

 default first
 ```

 or

 ```
 $ cat 2.ninja
 # Generated by kati 371194da71b3e191fea6f2ccceb7b061bd0de310

 pool local_pool
  depth = 72

 build _kati_always_build_: phony

 build second: phony bar
 rule rule0
  description = build $out
  command = /bin/sh -c "echo bar"
 build bar: rule0
 build first: phony foo
 rule rule1
  description = build $out
  command = /bin/sh -c "echo foo"
 build foo: rule1

 default first
 ```

 This is a quirk in Kati, see https://github.com/google/kati/issues/238

 Trying to find out the difference between the targets even after sorting them isn't too helpful:

 ```
 diff <(grep '^build' /tmp/1.ninja|sort) <(grep '^build' /tmp/2.ninja | sort)
 1c1
 < build bar: rule1
 ---
 > build bar: rule0
 3c3
 < build foo: rule0
 ---
 > build foo: rule1
 ```

 However, running these files through `canoninja` yields

 ```
 $ canoninja /tmp/1.ninja
 # Generated by kati 06f2569b2d16628608c000a76e3d495a5a5528cb

 pool local_pool
  depth = 72

 build _kati_always_build_: phony

 build first: phony foo
 rule R2f9981d3c152fc255370dc67028244f7bed72a03
  description = build $out
  command = /bin/sh -c "echo foo"
 build foo: R2f9981d3c152fc255370dc67028244f7bed72a03
 build second: phony bar
 rule R62640f3f9095cf2da5b9d9e2a82f746cc710c94c
  description = build $out
  command = /bin/sh -c "echo bar"
 build bar: R62640f3f9095cf2da5b9d9e2a82f746cc710c94c

 default first
 ```

 and

 ```
 ~/go/bin/canoninja /tmp/2.ninja
 # Generated by kati 371194da71b3e191fea6f2ccceb7b061bd0de310

 pool local_pool
  depth = 72

 build _kati_always_build_: phony

 build second: phony bar
 rule R62640f3f9095cf2da5b9d9e2a82f746cc710c94c
  description = build $out
  command = /bin/sh -c "echo bar"
 build bar: R62640f3f9095cf2da5b9d9e2a82f746cc710c94c
 build first: phony foo
 rule R2f9981d3c152fc255370dc67028244f7bed72a03
  description = build $out
  command = /bin/sh -c "echo foo"
 build foo: R2f9981d3c152fc255370dc67028244f7bed72a03

 default first
 ```

 and when we extract only build statements and sort them, we see that both Ninja files define the same graph:

 ```shell
 $ diff <(~/go/bin/canoninja /tmp/1.ninja | grep '^build' | sort) \
        <(~/go/bin/canoninja /tmp/2.ninja | grep '^build' | sort)
 ```

 # Todo

 * Optionally output only the build statements, optionally sorted
 * Handle continuation lines correctly
	# Ninja File Canonicalizer

	Suppose we have a tool that generates a Ninja file from some other description (think Kati and makefiles), and during
	the testing we discovered a regression. Furthermore, suppose that the generated Ninja file is large (think millions of
	lines). And, the new Ninja file has build statements and rules in a slightly different order. As the tool generates the
	rule names, the real differences in the output of the `diff` command are drowned in noise. Enter Canoninja.

	Canoninja renames each Ninja rule to the hash of its contents. After that, we can just sort the build statements, and a
	simple `comm` command immediately reveal the essential difference between the files.

	## Example

	Consider the following makefile

	```makefile
	second :=
	first: foo
	foo:
	@echo foo
	second: bar
	bar:
	@echo bar
	```

	Depending on Kati version converting it to Ninja file will yield either:

	```
	$ cat /tmp/1.ninja
	# Generated by kati 06f2569b2d16628608c000a76e3d495a5a5528cb

	pool local_pool
	depth = 72

	build _kati_always_build_: phony

	build first: phony foo
	rule rule0
	description = build $out
	command = /bin/sh -c "echo foo"
	build foo: rule0
	build second: phony bar
	rule rule1
	description = build $out
	command = /bin/sh -c "echo bar"
	build bar: rule1

	default first
	```

	or

	```
	$ cat 2.ninja
	# Generated by kati 371194da71b3e191fea6f2ccceb7b061bd0de310

	pool local_pool
	depth = 72

	build _kati_always_build_: phony

	build second: phony bar
	rule rule0
	description = build $out
	command = /bin/sh -c "echo bar"
	build bar: rule0
	build first: phony foo
	rule rule1
	description = build $out
	command = /bin/sh -c "echo foo"
	build foo: rule1

	default first
	```

	This is a quirk in Kati, see https://github.com/google/kati/issues/238

	Trying to find out the difference between the targets even after sorting them isn't too helpful:

	```
	diff <(grep '^build' /tmp/1.ninja\|sort) <(grep '^build' /tmp/2.ninja \| sort)
	1c1
	< build bar: rule1
	---
	> build bar: rule0
	3c3
	< build foo: rule0
	---
	> build foo: rule1
	```

	However, running these files through `canoninja` yields

	```
	$ canoninja /tmp/1.ninja
	# Generated by kati 06f2569b2d16628608c000a76e3d495a5a5528cb

	pool local_pool
	depth = 72

	build _kati_always_build_: phony

	build first: phony foo
	rule R2f9981d3c152fc255370dc67028244f7bed72a03
	description = build $out
	command = /bin/sh -c "echo foo"
	build foo: R2f9981d3c152fc255370dc67028244f7bed72a03
	build second: phony bar
	rule R62640f3f9095cf2da5b9d9e2a82f746cc710c94c
	description = build $out
	command = /bin/sh -c "echo bar"
	build bar: R62640f3f9095cf2da5b9d9e2a82f746cc710c94c

	default first
	```

	and

	```
	~/go/bin/canoninja /tmp/2.ninja
	# Generated by kati 371194da71b3e191fea6f2ccceb7b061bd0de310

	pool local_pool
	depth = 72

	build _kati_always_build_: phony

	build second: phony bar
	rule R62640f3f9095cf2da5b9d9e2a82f746cc710c94c
	description = build $out
	command = /bin/sh -c "echo bar"
	build bar: R62640f3f9095cf2da5b9d9e2a82f746cc710c94c
	build first: phony foo
	rule R2f9981d3c152fc255370dc67028244f7bed72a03
	description = build $out
	command = /bin/sh -c "echo foo"
	build foo: R2f9981d3c152fc255370dc67028244f7bed72a03

	default first
	```

	and when we extract only build statements and sort them, we see that both Ninja files define the same graph:

	```shell
	$ diff <(~/go/bin/canoninja /tmp/1.ninja \| grep '^build' \| sort) \
	<(~/go/bin/canoninja /tmp/2.ninja \| grep '^build' \| sort)
	```

	# Todo

	* Optionally output only the build statements, optionally sorted
	* Handle continuation lines correctly