Blame - docs/MergeFunctions.rst - toolchain/llvm

blob: 7c51adac681a07f7107d3862b4d88cf643c4aa00 [file] [log] [blame]

Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	1	=================================
				2	MergeFunctions pass, how it works
				3	=================================
				4
				5	.. contents::
				6	:local:
				7
				8	Introduction
				9	============
				10	Sometimes code contains equal functions, or functions that does exactly the same
				11	thing even though they are non-equal on the IR level (e.g.: multiplication on 2
				12	and 'shl 1'). It could happen due to several reasons: mainly, the usage of
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	13	templates and automatic code generators. Though, sometimes the user itself could
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	14	write the same thing twice :-)
				15
				16	The main purpose of this pass is to recognize such functions and merge them.
				17
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	18	This document is the extension to pass comments and describes the pass logic. It
				19	describes the algorithm that is used in order to compare functions and
				20	explains how we could combine equal functions correctly to keep the module
				21	valid.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	22
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	23	Material is brought in a top-down form, so the reader could start to learn pass
				24	from high level ideas and end with low-level algorithm details, thus preparing
				25	him or her for reading the sources.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	26
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	27	The main goal is to describe the algorithm and logic here and the concept. If
				28	you don't want to read the source code, but want to understand pass
				29	algorithms, this document is good for you. The author tries not to repeat the
				30	source-code and covers only common cases to avoid the cases of needing to
				31	update this document after any minor code changes.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	32
				33
				34	What should I know to be able to follow along with this document?
				35	-----------------------------------------------------------------
				36
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	37	The reader should be familiar with common compile-engineering principles and
				38	LLVM code fundamentals. In this article, we assume the reader is familiar with
				39	`Single Static Assignment
				40	<http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
				41	concept and has an understanding of
				42	`IR structure <http://llvm.org/docs/LangRef.html#high-level-structure>`_.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	43
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	44	We will use terms such as
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	45	"`module <http://llvm.org/docs/LangRef.html#high-level-structure>`_",
				46	"`function <http://llvm.org/docs/ProgrammersManual.html#the-function-class>`_",
				47	"`basic block <http://en.wikipedia.org/wiki/Basic_block>`_",
				48	"`user <http://llvm.org/docs/ProgrammersManual.html#the-user-class>`_",
				49	"`value <http://llvm.org/docs/ProgrammersManual.html#the-value-class>`_",
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	50	"`instruction
				51	<http://llvm.org/docs/ProgrammersManual.html#the-instruction-class>`_".
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	52
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	53	As a good starting point, the Kaleidoscope tutorial can be used:
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	54
				55	:doc:`tutorial/index`
				56
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	57	It's especially important to understand chapter 3 of tutorial:
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	58
Etienne Bergeron	9da4a63	2016-07-13 06:10:37 +0000	[diff] [blame]	59	:doc:`tutorial/LangImpl03`
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	60
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	61	The reader should also know how passes work in LLVM. They could use this
				62	article as a reference and start point here:
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	63
				64	:doc:`WritingAnLLVMPass`
				65
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	66	What else? Well perhaps the reader should also have some experience in LLVM pass
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	67	debugging and bug-fixing.
				68
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	69	Narrative structure
				70	-------------------
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	71	The article consists of three parts. The first part explains pass functionality
				72	on the top-level. The second part describes the comparison procedure itself.
				73	The third part describes the merging process.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	74
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	75	In every part, the author tries to put the contents in the top-down form.
				76	The top-level methods will first be described followed by the terminal ones at
				77	the end, in the tail of each part. If the reader sees the reference to the
Chandler Carruth	bc02668	2015-03-11 00:15:44 +0000	[diff] [blame]	78	method that wasn't described yet, they will find its description a bit below.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	79
				80	Basics
				81	======
				82
				83	How to do it?
				84	-------------
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	85	Do we need to merge functions? The obvious answer is: Yes, that is quite a
				86	possible case. We usually do have duplicates and it would be good to get rid
				87	of them. But how do we detect duplicates? This is the idea: we split functions
				88	into smaller bricks or parts and compare the "bricks" amount. If equal,
				89	we compare the "bricks" themselves, and then do our conclusions about functions
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	90	themselves.
				91
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	92	What could the difference be? For example, on a machine with 64-bit pointers
				93	(let's assume we have only one address space), one function stores a 64-bit
				94	integer, while another one stores a pointer. If the target is the machine
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	95	mentioned above, and if functions are identical, except the parameter type (we
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	96	could consider it as a part of function type), then we can treat a ``uint64_t``
				97	and a ``void*`` as equal.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	98
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	99	This is just an example; more possible details are described a bit below.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	100
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	101	As another example, the reader may imagine two more functions. The first
				102	function performs a multiplication on 2, while the second one performs an
				103	arithmetic right shift on 1.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	104
				105	Possible solutions
				106	^^^^^^^^^^^^^^^^^^
				107	Let's briefly consider possible options about how and what we have to implement
				108	in order to create full-featured functions merging, and also what it would
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	109	mean for us.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	110
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	111	Equal function detection obviously supposes that a "detector" method to be
				112	implemented and latter should answer the question "whether functions are equal".
				113	This "detector" method consists of tiny "sub-detectors", which each answers
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	114	exactly the same question, but for function parts.
				115
				116	As the second step, we should merge equal functions. So it should be a "merger"
				117	method. "Merger" accepts two functions F1 and F2, and produces F1F2
				118	function, the result of merging.
				119
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	120	Having such routines in our hands, we can process a whole module, and merge all
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	121	equal functions.
				122
				123	In this case, we have to compare every function with every another function. As
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	124	the reader may notice, this way seems to be quite expensive. Of course we could
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	125	introduce hashing and other helpers, but it is still just an optimization, and
				126	thus the level of O(N*N) complexity.
				127
				128	Can we reach another level? Could we introduce logarithmical search, or random
				129	access lookup? The answer is: "yes".
				130
				131	Random-access
				132	"""""""""""""
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	133	How it could this be done? Just convert each function to a number, and gather
				134	all of them in a special hash-table. Functions with equal hashes are equal.
				135	Good hashing means, that every function part must be taken into account. That
				136	means we have to convert every function part into some number, and then add it
				137	into the hash. The lookup-up time would be small, but such a approach adds some
				138	delay due to the hashing routine.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	139
				140	Logarithmical search
				141	""""""""""""""""""""
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	142	We could introduce total ordering among the functions set, once ordered we
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	143	could then implement a logarithmical search. Lookup time still depends on N,
				144	but adds a little of delay (log(N)).
				145
				146	Present state
				147	"""""""""""""
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	148	Both of the approaches (random-access and logarithmical) have been implemented
				149	and tested and both give a very good improvement. What was most
				150	surprising is that logarithmical search was faster; sometimes by up to 15%. The
				151	hashing method needs some extra CPU time, which is the main reason why it works
				152	slower; in most cases, total "hashing" time is greater than total
				153	"logarithmical-search" time.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	154
				155	So, preference has been granted to the "logarithmical search".
				156
				157	Though in the case of need, logarithmical-search (read "total-ordering") could
				158	be used as a milestone on our way to the random-access implementation.
				159
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	160	Every comparison is based either on the numbers or on the flags comparison. In
				161	the random-access approach, we could use the same comparison algorithm.
				162	During comparison, we exit once we find the difference, but here we might have
				163	to scan the whole function body every time (note, it could be slower). Like in
				164	"total-ordering", we will track every number and flag, but instead of
				165	comparison, we should get the numbers sequence and then create the hash number.
				166	So, once again, total-ordering could be considered as a milestone for even
				167	faster (in theory) random-access approach.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	168
				169	MergeFunctions, main fields and runOnModule
				170	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	171	There are two main important fields in the class:
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	172
				173	``FnTree`` – the set of all unique functions. It keeps items that couldn't be
				174	merged with each other. It is defined as:
				175
				176	``std::set<FunctionNode> FnTree;``
				177
				178	Here ``FunctionNode`` is a wrapper for ``llvm::Function`` class, with
				179	implemented “<” operator among the functions set (below we explain how it works
				180	exactly; this is a key point in fast functions comparison).
				181
				182	``Deferred`` – merging process can affect bodies of functions that are in
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	183	``FnTree`` already. Obviously, such functions should be rechecked again. In this
				184	case, we remove them from ``FnTree``, and mark them to be rescanned, namely
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	185	put them into ``Deferred`` list.
				186
				187	runOnModule
				188	"""""""""""
				189	The algorithm is pretty simple:
				190
				191	1. Put all module's functions into the worklist.
				192
				193	2. Scan worklist's functions twice: first enumerate only strong functions and
				194	then only weak ones:
				195
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	196	2.1. Loop body: take a function from worklist (call it FCur) and try to
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	197	insert it into FnTree: check whether FCur is equal to one of functions
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	198	in FnTree. If there is an equal function in FnTree
				199	(call it FExists): merge function FCur with FExists. Otherwise add
				200	the function from the worklist to FnTree.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	201
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	202	3. Once the worklist scanning and merging operations are complete, check the
				203	Deferred list. If it is not empty: refill the worklist contents with
				204	Deferred list and redo step 2, if the Deferred list is empty, then exit
				205	from method.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	206
				207	Comparison and logarithmical search
				208	"""""""""""""""""""""""""""""""""""
				209	Let's recall our task: for every function F from module M, we have to find
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	210	equal functions F` in the shortest time possible , and merge them into a
				211	single function.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	212
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	213	Defining total ordering among the functions set allows us to organize
				214	functions into a binary tree. The lookup procedure complexity would be
				215	estimated as O(log(N)) in this case. But how do we define total-ordering?
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	216
				217	We have to introduce a single rule applicable to every pair of functions, and
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	218	following this rule, then evaluate which of them is greater. What kind of rule
				219	could it be? Let's declare it as the "compare" method that returns one of 3
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	220	possible values:
				221
				222	-1, left is less than right,
				223
				224	0, left and right are equal,
				225
				226	1, left is greater than right.
				227
				228	Of course it means, that we have to maintain
				229	strict and non-strict order relation properties:
				230
				231	* reflexivity (``a <= a``, ``a == a``, ``a >= a``),
				232	* antisymmetry (if ``a <= b`` and ``b <= a`` then ``a == b``),
				233	* transitivity (``a <= b`` and ``b <= c``, then ``a <= c``)
				234	* asymmetry (if ``a < b``, then ``a > b`` or ``a == b``).
				235
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	236	As mentioned before, the comparison routine consists of
				237	"sub-comparison-routines", with each of them also consisting of
				238	"sub-comparison-routines", and so on. Finally, it ends up with primitive
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	239	comparison.
				240
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	241	Below, we will use the following operations:
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	242
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	243	#. ``cmpNumbers(number1, number2)`` is a method that returns -1 if left is less
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	244	than right; 0, if left and right are equal; and 1 otherwise.
				245
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	246	#. ``cmpFlags(flag1, flag2)`` is a hypothetical method that compares two flags.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	247	The logic is the same as in ``cmpNumbers``, where ``true`` is 1, and
				248	``false`` is 0.
				249
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	250	The rest of the article is based on MergeFunctions.cpp source code
				251	(found in <llvm_dir>/lib/Transforms/IPO/MergeFunctions.cpp). We would like
				252	to ask reader to keep this file open, so we could use it as a reference
				253	for further explanations.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	254
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	255	Now, we're ready to proceed to the next chapter and see how it works.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	256
				257	Functions comparison
				258	====================
				259	At first, let's define how exactly we compare complex objects.
				260
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	261	Complex object comparison (function, basic-block, etc) is mostly based on its
				262	sub-object comparison results. It is similar to the next "tree" objects
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	263	comparison:
				264
				265	#. For two trees T1 and T2 we perform depth-first-traversal and have
				266	two sequences as a product: "T1Items" and "T2Items".
				267
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	268	#. We then compare chains "T1Items" and "T2Items" in
				269	the most-significant-item-first order. The result of items comparison
				270	would be the result of T1 and T2 comparison itself.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	271
				272	FunctionComparator::compare(void)
				273	---------------------------------
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	274	A brief look at the source code tells us that the comparison starts in the
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	275	“``int FunctionComparator::compare(void)``” method.
				276
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	277	1. The first parts to be compared are the function's attributes and some
				278	properties that is outside the “attributes” term, but still could make the
				279	function different without changing its body. This part of the comparison is
				280	usually done within simple cmpNumbers or cmpFlags operations (e.g.
				281	``cmpFlags(F1->hasGC(), F2->hasGC())``). Below is a full list of function's
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	282	properties to be compared on this stage:
				283
				284	* Attributes (those are returned by ``Function::getAttributes()``
				285	method).
				286
				287	* GC, for equivalence, RHS and LHS should be both either without
				288	GC or with the same one.
				289
				290	* Section, just like a GC: RHS and LHS should be defined in the
				291	same section.
				292
				293	* Variable arguments. LHS and RHS should be both either with or
				294	without var-args.
				295
				296	* Calling convention should be the same.
				297
				298	2. Function type. Checked by ``FunctionComparator::cmpType(Type, Type)``
				299	method. It checks return type and parameters type; the method itself will be
				300	described later.
				301
				302	3. Associate function formal parameters with each other. Then comparing function
				303	bodies, if we see the usage of LHS's i-th argument in LHS's body, then,
				304	we want to see usage of RHS's i-th argument at the same place in RHS's
				305	body, otherwise functions are different. On this stage we grant the preference
				306	to those we met later in function body (value we met first would be less).
				307	This is done by “``FunctionComparator::cmpValues(const Value, const Value)``”
				308	method (will be described a bit later).
				309
				310	4. Function body comparison. As it written in method comments:
				311
				312	“We do a CFG-ordered walk since the actual ordering of the blocks in the linked
				313	list is immaterial. Our walk starts at the entry block for both functions, then
				314	takes each block from each terminator in order. As an artifact, this also means
				315	that unreachable blocks are ignored.”
				316
				317	So, using this walk we get BBs from left and right in the same order, and
				318	compare them by “``FunctionComparator::compare(const BasicBlock*, const
				319	BasicBlock*)``” method.
				320
				321	We also associate BBs with each other, like we did it with function formal
				322	arguments (see ``cmpValues`` method below).
				323
				324	FunctionComparator::cmpType
				325	---------------------------
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	326	Consider how type comparison works.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	327
				328	1. Coerce pointer to integer. If left type is a pointer, try to coerce it to the
				329	integer type. It could be done if its address space is 0, or if address spaces
				330	are ignored at all. Do the same thing for the right type.
				331
				332	2. If left and right types are equal, return 0. Otherwise we need to give
				333	preference to one of them. So proceed to the next step.
				334
				335	3. If types are of different kind (different type IDs). Return result of type
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	336	IDs comparison, treating them as numbers (use ``cmpNumbers`` operation).
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	337
				338	4. If types are vectors or integers, return result of their pointers comparison,
				339	comparing them as numbers.
				340
				341	5. Check whether type ID belongs to the next group (call it equivalent-group):
				342
				343	* Void
				344
				345	* Float
				346
				347	* Double
				348
				349	* X86_FP80
				350
				351	* FP128
				352
				353	* PPC_FP128
				354
				355	* Label
				356
				357	* Metadata.
				358
				359	If ID belongs to group above, return 0. Since it's enough to see that
				360	types has the same ``TypeID``. No additional information is required.
				361
				362	6. Left and right are pointers. Return result of address space comparison
				363	(numbers comparison).
				364
				365	7. Complex types (structures, arrays, etc.). Follow complex objects comparison
				366	technique (see the very first paragraph of this chapter). Both left and
				367	right are to be expanded and their element types will be checked the same
				368	way. If we get -1 or 1 on some stage, return it. Otherwise return 0.
				369
				370	8. Steps 1-6 describe all the possible cases, if we passed steps 1-6 and didn't
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	371	get any conclusions, then invoke ``llvm_unreachable``, since it's quite an
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	372	unexpectable case.
				373
				374	cmpValues(const Value, const Value)
				375	-------------------------------------
				376	Method that compares local values.
				377
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	378	This method gives us an answer to a very curious question: whether we could
				379	treat local values as equal, and which value is greater otherwise. It's
				380	better to start from example:
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	381
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	382	Consider the situation when we're looking at the same place in left
				383	function "FL" and in right function "FR". Every part of left place is
				384	equal to the corresponding part of right place, and (!) both parts use
				385	Value instances, for example:
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	386
Renato Golin	88ea57f	2016-07-20 12:16:38 +0000	[diff] [blame]	387	.. code-block:: text
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	388
				389	instr0 i32 %LV ; left side, function FL
				390	instr0 i32 %RV ; right side, function FR
				391
				392	So, now our conclusion depends on Value instances comparison.
				393
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	394	The main purpose of this method is to determine relation between such values.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	395
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	396	What can we expect from equal functions? At the same place, in functions
				397	"FL" and "FR" we expect to see equal values, or values defined at
				398	the same place in "FL" and "FR".
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	399
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	400	Consider a small example here:
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	401
Renato Golin	88ea57f	2016-07-20 12:16:38 +0000	[diff] [blame]	402	.. code-block:: text
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	403
				404	define void %f(i32 %pf0, i32 %pf1) {
				405	instr0 i32 %pf0 instr1 i32 %pf1 instr2 i32 123
				406	}
				407
Renato Golin	88ea57f	2016-07-20 12:16:38 +0000	[diff] [blame]	408	.. code-block:: text
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	409
				410	define void %g(i32 %pg0, i32 %pg1) {
				411	instr0 i32 %pg0 instr1 i32 %pg0 instr2 i32 123
				412	}
				413
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	414	In this example, pf0 is associated with pg0, pf1 is associated with
				415	pg1, and we also declare that pf0 < pf1, and thus pg0 < pf1.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	416
				417	Instructions with opcode "instr0" would be equal, since their types and
				418	opcodes are equal, and values are associated.
				419
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	420	Instructions with opcode "instr1" from f is greater than instructions
				421	with opcode "instr1" from g; here we have equal types and opcodes, but
				422	"pf1 is greater than "pg0".
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	423
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	424	Instructions with opcode "instr2" are equal, because their opcodes and
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	425	types are equal, and the same constant is used as a value.
				426
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	427	What we associate in cmpValues?
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	428	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				429	* Function arguments. i-th argument from left function associated with
				430	i-th argument from right function.
				431	* BasicBlock instances. In basic-block enumeration loop we associate i-th
				432	BasicBlock from the left function with i-th BasicBlock from the right
				433	function.
				434	* Instructions.
				435	* Instruction operands. Note, we can meet Value here we have never seen
				436	before. In this case it is not a function argument, nor BasicBlock, nor
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	437	Instruction. It is a global value. It is a constant, since it's the only
				438	supposed global here. The method also compares: Constants that are of the
				439	same type and if right constant can be losslessly bit-casted to the left
				440	one, then we also compare them.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	441
				442	How to implement cmpValues?
				443	^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	444	Association is a case of equality for us. We just treat such values as equal,
				445	but, in general, we need to implement antisymmetric relation. As mentioned
				446	above, to understand what is less, we can use order in which we
				447	meet values. If both values have the same order in a function (met at the same
				448	time), we then treat values as associated. Otherwise – it depends on who was
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	449	first.
				450
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	451	Every time we run the top-level compare method, we initialize two identical
				452	maps (one for the left side, another one for the right side):
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	453
				454	``map<Value, int> sn_mapL, sn_mapR;``
				455
				456	The key of the map is the Value itself, the value – is its order (call it
				457	serial number).
				458
				459	To add value V we need to perform the next procedure:
				460
				461	``sn_map.insert(std::make_pair(V, sn_map.size()));``
				462
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	463	For the first Value, map will return 0, for the second Value map will
				464	return 1, and so on.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	465
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	466	We can then check whether left and right values met at the same time with
				467	a simple comparison:
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	468
				469	``cmpNumbers(sn_mapL[Left], sn_mapR[Right]);``
				470
				471	Of course, we can combine insertion and comparison:
				472
				473	.. code-block:: c++
				474
				475	std::pair<iterator, bool>
				476	LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), RightRes
				477	= sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
				478	return cmpNumbers(LeftRes.first->second, RightRes.first->second);
				479
				480	Let's look, how whole method could be implemented.
				481
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	482	1. We have to start with the bad news. Consider function self and
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	483	cross-referencing cases:
				484
				485	.. code-block:: c++
				486
				487	// self-reference unsigned fact0(unsigned n) { return n > 1 ? n
				488	* fact0(n-1) : 1; } unsigned fact1(unsigned n) { return n > 1 ? n *
				489	fact1(n-1) : 1; }
				490
				491	// cross-reference unsigned ping(unsigned n) { return n!= 0 ? pong(n-1) : 0;
				492	} unsigned pong(unsigned n) { return n!= 0 ? ping(n-1) : 0; }
				493
				494	..
				495
				496	This comparison has been implemented in initial MergeFunctions pass
				497	version. But, unfortunately, it is not transitive. And this is the only case
				498	we can't convert to less-equal-greater comparison. It is a seldom case, 4-5
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	499	functions of 10000 (checked in test-suite), and, we hope, the reader would
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	500	forgive us for such a sacrifice in order to get the O(log(N)) pass time.
				501
				502	2. If left/right Value is a constant, we have to compare them. Return 0 if it
				503	is the same constant, or use ``cmpConstants`` method otherwise.
				504
				505	3. If left/right is InlineAsm instance. Return result of Value pointers
				506	comparison.
				507
				508	4. Explicit association of L (left value) and R (right value). We need to
				509	find out whether values met at the same time, and thus are associated. Or we
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	510	need to put the rule: when we treat L < R. Now it is easy: we just return
				511	the result of numbers comparison:
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	512
				513	.. code-block:: c++
				514
				515	std::pair<iterator, bool>
				516	LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())),
				517	RightRes = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
				518	if (LeftRes.first->second == RightRes.first->second) return 0;
				519	if (LeftRes.first->second < RightRes.first->second) return -1;
				520	return 1;
				521
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	522	Now when cmpValues returns 0, we can proceed the comparison procedure.
				523	Otherwise, if we get (-1 or 1), we need to pass this result to the top level,
				524	and finish comparison procedure.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	525
				526	cmpConstants
				527	------------
				528	Performs constants comparison as follows:
				529
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	530	1. Compare constant types using ``cmpType`` method. If the result is -1 or 1,
				531	goto step 2, otherwise proceed to step 3.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	532
				533	2. If types are different, we still can check whether constants could be
				534	losslessly bitcasted to each other. The further explanation is modification of
				535	``canLosslesslyBitCastTo`` method.
				536
				537	2.1 Check whether constants are of the first class types
				538	(``isFirstClassType`` check):
				539
				540	2.1.1. If both constants are not of the first class type: return result
				541	of ``cmpType``.
				542
				543	2.1.2. Otherwise, if left type is not of the first class, return -1. If
				544	right type is not of the first class, return 1.
				545
				546	2.1.3. If both types are of the first class type, proceed to the next step
				547	(2.1.3.1).
				548
				549	2.1.3.1. If types are vectors, compare their bitwidth using the
				550	cmpNumbers. If result is not 0, return it.
				551
				552	2.1.3.2. Different types, but not a vectors:
				553
				554	* if both of them are pointers, good for us, we can proceed to step 3.
				555	* if one of types is pointer, return result of isPointer flags
				556	comparison (cmpFlags operation).
				557	* otherwise we have no methods to prove bitcastability, and thus return
				558	result of types comparison (-1 or 1).
				559
				560	Steps below are for the case when types are equal, or case when constants are
				561	bitcastable:
				562
				563	3. One of constants is a "null" value. Return the result of
				564	``cmpFlags(L->isNullValue, R->isNullValue)`` comparison.
				565
				566	4. Compare value IDs, and return result if it is not 0:
				567
				568	.. code-block:: c++
				569
				570	if (int Res = cmpNumbers(L->getValueID(), R->getValueID()))
				571	return Res;
				572
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	573	5. Compare the contents of constants. The comparison depends on the kind of
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	574	constants, but on this stage it is just a lexicographical comparison. Just see
				575	how it was described in the beginning of "Functions comparison" paragraph.
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	576	Mathematically, it is equal to the next case: we encode left constant and right
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	577	constant (with similar way bitcode-writer does). Then compare left code
				578	sequence and right code sequence.
				579
				580	compare(const BasicBlock, const BasicBlock)
				581	---------------------------------------------
				582	Compares two BasicBlock instances.
				583
				584	It enumerates instructions from left BB and right BB.
				585
				586	1. It assigns serial numbers to the left and right instructions, using
				587	``cmpValues`` method.
				588
				589	2. If one of left or right is GEP (``GetElementPtr``), then treat GEP as
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	590	greater than other instructions. If both instructions are GEPs use ``cmpGEP``
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	591	method for comparison. If result is -1 or 1, pass it to the top-level
				592	comparison (return it).
				593
				594	3.1. Compare operations. Call ``cmpOperation`` method. If result is -1 or
				595	1, return it.
				596
				597	3.2. Compare number of operands, if result is -1 or 1, return it.
				598
				599	3.3. Compare operands themselves, use ``cmpValues`` method. Return result
				600	if it is -1 or 1.
				601
				602	3.4. Compare type of operands, using ``cmpType`` method. Return result if
				603	it is -1 or 1.
				604
				605	3.5. Proceed to the next instruction.
				606
				607	4. We can finish instruction enumeration in 3 cases:
				608
				609	4.1. We reached the end of both left and right basic-blocks. We didn't
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	610	exit on steps 1-3, so contents are equal, return 0.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	611
				612	4.2. We have reached the end of the left basic-block. Return -1.
				613
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	614	4.3. Return 1 (we reached the end of the right basic block).
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	615
				616	cmpGEP
				617	------
				618	Compares two GEPs (``getelementptr`` instructions).
				619
				620	It differs from regular operations comparison with the only thing: possibility
				621	to use ``accumulateConstantOffset`` method.
				622
				623	So, if we get constant offset for both left and right GEPs, then compare it as
				624	numbers, and return comparison result.
				625
				626	Otherwise treat it like a regular operation (see previous paragraph).
				627
				628	cmpOperation
				629	------------
				630	Compares instruction opcodes and some important operation properties.
				631
				632	1. Compare opcodes, if it differs return the result.
				633
				634	2. Compare number of operands. If it differs – return the result.
				635
				636	3. Compare operation types, use cmpType. All the same – if types are
				637	different, return result.
				638
				639	4. Compare subclassOptionalData, get it with ``getRawSubclassOptionalData``
				640	method, and compare it like a numbers.
				641
				642	5. Compare operand types.
				643
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	644	6. For some particular instructions, check equivalence (relation in our case) of
				645	some significant attributes. For example, we have to compare alignment for
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	646	``load`` instructions.
				647
				648	O(log(N))
				649	---------
				650	Methods described above implement order relationship. And latter, could be used
				651	for nodes comparison in a binary tree. So we can organize functions set into
				652	the binary tree and reduce the cost of lookup procedure from
				653	O(N*N) to O(log(N)).
				654
				655	Merging process, mergeTwoFunctions
				656	==================================
				657	Once MergeFunctions detected that current function (G) is equal to one that
				658	were analyzed before (function F) it calls ``mergeTwoFunctions(Function*,
				659	Function*)``.
				660
				661	Operation affects ``FnTree`` contents with next way: F will stay in
				662	``FnTree``. G being equal to F will not be added to ``FnTree``. Calls of
				663	G would be replaced with something else. It changes bodies of callers. So,
				664	functions that calls G would be put into ``Deferred`` set and removed from
				665	``FnTree``, and analyzed again.
				666
				667	The approach is next:
				668
				669	1. Most wished case: when we can use alias and both of F and G are weak. We
				670	make both of them with aliases to the third strong function H. Actually H
				671	is F. See below how it's made (but it's better to look straight into the
				672	source code). Well, this is a case when we can just replace G with F
				673	everywhere, we use ``replaceAllUsesWith`` operation here (RAUW).
				674
				675	2. F could not be overridden, while G could. It would be good to do the
				676	next: after merging the places where overridable function were used, still use
				677	overridable stub. So try to make G alias to F, or create overridable tail
				678	call wrapper around F and replace G with that call.
				679
				680	3. Neither F nor G could be overridden. We can't use RAUW. We can just
				681	change the callers: call F instead of G. That's what
				682	``replaceDirectCallers`` does.
				683
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	684	Below is a detailed body description.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	685
				686	If “F” may be overridden
				687	------------------------
				688	As follows from ``mayBeOverridden`` comments: “whether the definition of this
Sylvestre Ledru	3c5ec72	2016-02-14 20:16:22 +0000	[diff] [blame]	689	global may be replaced by something non-equivalent at link time”. If so, that's
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	690	ok: we can use alias to F instead of G or change call instructions itself.
				691
				692	HasGlobalAliases, removeUsers
				693	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				694	First consider the case when we have global aliases of one function name to
				695	another. Our purpose is make both of them with aliases to the third strong
				696	function. Though if we keep F alive and without major changes we can leave it
				697	in ``FnTree``. Try to combine these two goals.
				698
				699	Do stub replacement of F itself with an alias to F.
				700
				701	1. Create stub function H, with the same name and attributes like function
				702	F. It takes maximum alignment of F and G.
				703
				704	2. Replace all uses of function F with uses of function H. It is the two
				705	steps procedure instead. First of all, we must take into account, all functions
				706	from whom F is called would be changed: since we change the call argument
				707	(from F to H). If so we must to review these caller functions again after
				708	this procedure. We remove callers from ``FnTree``, method with name
				709	``removeUsers(F)`` does that (don't confuse with ``replaceAllUsesWith``):
				710
				711	2.1. ``Inside removeUsers(Value*
				712	V)`` we go through the all values that use value V (or F in our context).
				713	If value is instruction, we go to function that holds this instruction and
				714	mark it as to-be-analyzed-again (put to ``Deferred`` set), we also remove
				715	caller from ``FnTree``.
				716
				717	2.2. Now we can do the replacement: call ``F->replaceAllUsesWith(H)``.
				718
				719	3. H (that now "officially" plays F's role) is replaced with alias to F.
				720	Do the same with G: replace it with alias to F. So finally everywhere F
				721	was used, we use H and it is alias to F, and everywhere G was used we
				722	also have alias to F.
				723
				724	4. Set F linkage to private. Make it strong :-)
				725
				726	No global aliases, replaceDirectCallers
				727	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	728	If global aliases are not supported. We call ``replaceDirectCallers``. Just
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	729	go through all calls of G and replace it with calls of F. If you look into
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	730	the method you will see that it scans all uses of G too, and if use is callee
				731	(if user is call instruction and G is used as what to be called), we replace
				732	it with use of F.
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	733
				734	If “F” could not be overridden, fix it!
				735	"""""""""""""""""""""""""""""""""""""""
				736
				737	We call ``writeThunkOrAlias(Function F, Function G)``. Here we try to replace
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	738	G with alias to F first. The next conditions are essential:
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	739
				740	* target should support global aliases,
				741	* the address itself of G should be not significant, not named and not
				742	referenced anywhere,
				743	* function should come with external, local or weak linkage.
				744
				745	Otherwise we write thunk: some wrapper that has G's interface and calls F,
				746	so G could be replaced with this wrapper.
				747
				748	writeAlias
				749
				750	As follows from llvm reference:
				751
				752	“Aliases act as second name for the aliasee value”. So we just want to create
Aditya Kumar	a83fad8	2018-08-18 20:17:19 +0000	[diff] [blame]	753	a second name for F and use it instead of G:
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	754
				755	1. create global alias itself (GA),
				756
				757	2. adjust alignment of F so it must be maximum of current and G's alignment;
				758
				759	3. replace uses of G:
				760
				761	3.1. first mark all callers of G as to-be-analyzed-again, using
				762	``removeUsers`` method (see chapter above),
				763
				764	3.2. call ``G->replaceAllUsesWith(GA)``.
				765
				766	4. Get rid of G.
				767
				768	writeThunk
				769
				770	As it written in method comments:
				771
				772	“Replace G with a simple tail call to bitcast(F). Also replace direct uses of G
				773	with bitcast(F). Deletes G.”
				774
				775	In general it does the same as usual when we want to replace callee, except the
				776	first point:
				777
				778	1. We generate tail call wrapper around F, but with interface that allows use
				779	it instead of G.
				780
				781	2. “As-usual”: ``removeUsers`` and ``replaceAllUsesWith`` then.
				782
				783	3. Get rid of G.
				784
Stepan Dyatkovskiy	4912640	2014-12-10 17:42:01 +0000	[diff] [blame]	785