Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 1 | ================================= |
| 2 | MergeFunctions pass, how it works |
| 3 | ================================= |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | Introduction |
| 9 | ============ |
| 10 | Sometimes code contains equal functions, or functions that does exactly the same |
| 11 | thing even though they are non-equal on the IR level (e.g.: multiplication on 2 |
| 12 | and 'shl 1'). It could happen due to several reasons: mainly, the usage of |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 13 | templates and automatic code generators. Though, sometimes the user itself could |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 14 | write the same thing twice :-) |
| 15 | |
| 16 | The main purpose of this pass is to recognize such functions and merge them. |
| 17 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 18 | This document is the extension to pass comments and describes the pass logic. It |
| 19 | describes the algorithm that is used in order to compare functions and |
| 20 | explains how we could combine equal functions correctly to keep the module |
| 21 | valid. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 22 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 23 | Material is brought in a top-down form, so the reader could start to learn pass |
| 24 | from high level ideas and end with low-level algorithm details, thus preparing |
| 25 | him or her for reading the sources. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 26 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 27 | The main goal is to describe the algorithm and logic here and the concept. If |
| 28 | you *don't want* to read the source code, but want to understand pass |
| 29 | algorithms, this document is good for you. The author tries not to repeat the |
| 30 | source-code and covers only common cases to avoid the cases of needing to |
| 31 | update this document after any minor code changes. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 32 | |
| 33 | |
| 34 | What should I know to be able to follow along with this document? |
| 35 | ----------------------------------------------------------------- |
| 36 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 37 | The reader should be familiar with common compile-engineering principles and |
| 38 | LLVM code fundamentals. In this article, we assume the reader is familiar with |
| 39 | `Single Static Assignment |
| 40 | <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ |
| 41 | concept and has an understanding of |
| 42 | `IR structure <http://llvm.org/docs/LangRef.html#high-level-structure>`_. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 43 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 44 | We will use terms such as |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 45 | "`module <http://llvm.org/docs/LangRef.html#high-level-structure>`_", |
| 46 | "`function <http://llvm.org/docs/ProgrammersManual.html#the-function-class>`_", |
| 47 | "`basic block <http://en.wikipedia.org/wiki/Basic_block>`_", |
| 48 | "`user <http://llvm.org/docs/ProgrammersManual.html#the-user-class>`_", |
| 49 | "`value <http://llvm.org/docs/ProgrammersManual.html#the-value-class>`_", |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 50 | "`instruction |
| 51 | <http://llvm.org/docs/ProgrammersManual.html#the-instruction-class>`_". |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 52 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 53 | As a good starting point, the Kaleidoscope tutorial can be used: |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 54 | |
| 55 | :doc:`tutorial/index` |
| 56 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 57 | It's especially important to understand chapter 3 of tutorial: |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 58 | |
Etienne Bergeron | 9da4a63 | 2016-07-13 06:10:37 +0000 | [diff] [blame] | 59 | :doc:`tutorial/LangImpl03` |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 60 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 61 | The reader should also know how passes work in LLVM. They could use this |
| 62 | article as a reference and start point here: |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 63 | |
| 64 | :doc:`WritingAnLLVMPass` |
| 65 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 66 | What else? Well perhaps the reader should also have some experience in LLVM pass |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 67 | debugging and bug-fixing. |
| 68 | |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 69 | Narrative structure |
| 70 | ------------------- |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 71 | The article consists of three parts. The first part explains pass functionality |
| 72 | on the top-level. The second part describes the comparison procedure itself. |
| 73 | The third part describes the merging process. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 74 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 75 | In every part, the author tries to put the contents in the top-down form. |
| 76 | The top-level methods will first be described followed by the terminal ones at |
| 77 | the end, in the tail of each part. If the reader sees the reference to the |
Chandler Carruth | bc02668 | 2015-03-11 00:15:44 +0000 | [diff] [blame] | 78 | method that wasn't described yet, they will find its description a bit below. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 79 | |
| 80 | Basics |
| 81 | ====== |
| 82 | |
| 83 | How to do it? |
| 84 | ------------- |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 85 | Do we need to merge functions? The obvious answer is: Yes, that is quite a |
| 86 | possible case. We usually *do* have duplicates and it would be good to get rid |
| 87 | of them. But how do we detect duplicates? This is the idea: we split functions |
| 88 | into smaller bricks or parts and compare the "bricks" amount. If equal, |
| 89 | we compare the "bricks" themselves, and then do our conclusions about functions |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 90 | themselves. |
| 91 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 92 | What could the difference be? For example, on a machine with 64-bit pointers |
| 93 | (let's assume we have only one address space), one function stores a 64-bit |
| 94 | integer, while another one stores a pointer. If the target is the machine |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 95 | mentioned above, and if functions are identical, except the parameter type (we |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 96 | could consider it as a part of function type), then we can treat a ``uint64_t`` |
| 97 | and a ``void*`` as equal. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 98 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 99 | This is just an example; more possible details are described a bit below. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 100 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 101 | As another example, the reader may imagine two more functions. The first |
| 102 | function performs a multiplication on 2, while the second one performs an |
| 103 | arithmetic right shift on 1. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 104 | |
| 105 | Possible solutions |
| 106 | ^^^^^^^^^^^^^^^^^^ |
| 107 | Let's briefly consider possible options about how and what we have to implement |
| 108 | in order to create full-featured functions merging, and also what it would |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 109 | mean for us. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 110 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 111 | Equal function detection obviously supposes that a "detector" method to be |
| 112 | implemented and latter should answer the question "whether functions are equal". |
| 113 | This "detector" method consists of tiny "sub-detectors", which each answers |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 114 | exactly the same question, but for function parts. |
| 115 | |
| 116 | As the second step, we should merge equal functions. So it should be a "merger" |
| 117 | method. "Merger" accepts two functions *F1* and *F2*, and produces *F1F2* |
| 118 | function, the result of merging. |
| 119 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 120 | Having such routines in our hands, we can process a whole module, and merge all |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 121 | equal functions. |
| 122 | |
| 123 | In this case, we have to compare every function with every another function. As |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 124 | the reader may notice, this way seems to be quite expensive. Of course we could |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 125 | introduce hashing and other helpers, but it is still just an optimization, and |
| 126 | thus the level of O(N*N) complexity. |
| 127 | |
| 128 | Can we reach another level? Could we introduce logarithmical search, or random |
| 129 | access lookup? The answer is: "yes". |
| 130 | |
| 131 | Random-access |
| 132 | """"""""""""" |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 133 | How it could this be done? Just convert each function to a number, and gather |
| 134 | all of them in a special hash-table. Functions with equal hashes are equal. |
| 135 | Good hashing means, that every function part must be taken into account. That |
| 136 | means we have to convert every function part into some number, and then add it |
| 137 | into the hash. The lookup-up time would be small, but such a approach adds some |
| 138 | delay due to the hashing routine. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 139 | |
| 140 | Logarithmical search |
| 141 | """""""""""""""""""" |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 142 | We could introduce total ordering among the functions set, once ordered we |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 143 | could then implement a logarithmical search. Lookup time still depends on N, |
| 144 | but adds a little of delay (*log(N)*). |
| 145 | |
| 146 | Present state |
| 147 | """"""""""""" |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 148 | Both of the approaches (random-access and logarithmical) have been implemented |
| 149 | and tested and both give a very good improvement. What was most |
| 150 | surprising is that logarithmical search was faster; sometimes by up to 15%. The |
| 151 | hashing method needs some extra CPU time, which is the main reason why it works |
| 152 | slower; in most cases, total "hashing" time is greater than total |
| 153 | "logarithmical-search" time. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 154 | |
| 155 | So, preference has been granted to the "logarithmical search". |
| 156 | |
| 157 | Though in the case of need, *logarithmical-search* (read "total-ordering") could |
| 158 | be used as a milestone on our way to the *random-access* implementation. |
| 159 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 160 | Every comparison is based either on the numbers or on the flags comparison. In |
| 161 | the *random-access* approach, we could use the same comparison algorithm. |
| 162 | During comparison, we exit once we find the difference, but here we might have |
| 163 | to scan the whole function body every time (note, it could be slower). Like in |
| 164 | "total-ordering", we will track every number and flag, but instead of |
| 165 | comparison, we should get the numbers sequence and then create the hash number. |
| 166 | So, once again, *total-ordering* could be considered as a milestone for even |
| 167 | faster (in theory) random-access approach. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 168 | |
| 169 | MergeFunctions, main fields and runOnModule |
| 170 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 171 | There are two main important fields in the class: |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 172 | |
| 173 | ``FnTree`` – the set of all unique functions. It keeps items that couldn't be |
| 174 | merged with each other. It is defined as: |
| 175 | |
| 176 | ``std::set<FunctionNode> FnTree;`` |
| 177 | |
| 178 | Here ``FunctionNode`` is a wrapper for ``llvm::Function`` class, with |
| 179 | implemented “<” operator among the functions set (below we explain how it works |
| 180 | exactly; this is a key point in fast functions comparison). |
| 181 | |
| 182 | ``Deferred`` – merging process can affect bodies of functions that are in |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 183 | ``FnTree`` already. Obviously, such functions should be rechecked again. In this |
| 184 | case, we remove them from ``FnTree``, and mark them to be rescanned, namely |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 185 | put them into ``Deferred`` list. |
| 186 | |
| 187 | runOnModule |
| 188 | """"""""""" |
| 189 | The algorithm is pretty simple: |
| 190 | |
| 191 | 1. Put all module's functions into the *worklist*. |
| 192 | |
| 193 | 2. Scan *worklist*'s functions twice: first enumerate only strong functions and |
| 194 | then only weak ones: |
| 195 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 196 | 2.1. Loop body: take a function from *worklist* (call it *FCur*) and try to |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 197 | insert it into *FnTree*: check whether *FCur* is equal to one of functions |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 198 | in *FnTree*. If there *is* an equal function in *FnTree* |
| 199 | (call it *FExists*): merge function *FCur* with *FExists*. Otherwise add |
| 200 | the function from the *worklist* to *FnTree*. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 201 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 202 | 3. Once the *worklist* scanning and merging operations are complete, check the |
| 203 | *Deferred* list. If it is not empty: refill the *worklist* contents with |
| 204 | *Deferred* list and redo step 2, if the *Deferred* list is empty, then exit |
| 205 | from method. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 206 | |
| 207 | Comparison and logarithmical search |
| 208 | """"""""""""""""""""""""""""""""""" |
| 209 | Let's recall our task: for every function *F* from module *M*, we have to find |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 210 | equal functions *F`* in the shortest time possible , and merge them into a |
| 211 | single function. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 212 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 213 | Defining total ordering among the functions set allows us to organize |
| 214 | functions into a binary tree. The lookup procedure complexity would be |
| 215 | estimated as O(log(N)) in this case. But how do we define *total-ordering*? |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 216 | |
| 217 | We have to introduce a single rule applicable to every pair of functions, and |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 218 | following this rule, then evaluate which of them is greater. What kind of rule |
| 219 | could it be? Let's declare it as the "compare" method that returns one of 3 |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 220 | possible values: |
| 221 | |
| 222 | -1, left is *less* than right, |
| 223 | |
| 224 | 0, left and right are *equal*, |
| 225 | |
| 226 | 1, left is *greater* than right. |
| 227 | |
| 228 | Of course it means, that we have to maintain |
| 229 | *strict and non-strict order relation properties*: |
| 230 | |
| 231 | * reflexivity (``a <= a``, ``a == a``, ``a >= a``), |
| 232 | * antisymmetry (if ``a <= b`` and ``b <= a`` then ``a == b``), |
| 233 | * transitivity (``a <= b`` and ``b <= c``, then ``a <= c``) |
| 234 | * asymmetry (if ``a < b``, then ``a > b`` or ``a == b``). |
| 235 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 236 | As mentioned before, the comparison routine consists of |
| 237 | "sub-comparison-routines", with each of them also consisting of |
| 238 | "sub-comparison-routines", and so on. Finally, it ends up with primitive |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 239 | comparison. |
| 240 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 241 | Below, we will use the following operations: |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 242 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 243 | #. ``cmpNumbers(number1, number2)`` is a method that returns -1 if left is less |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 244 | than right; 0, if left and right are equal; and 1 otherwise. |
| 245 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 246 | #. ``cmpFlags(flag1, flag2)`` is a hypothetical method that compares two flags. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 247 | The logic is the same as in ``cmpNumbers``, where ``true`` is 1, and |
| 248 | ``false`` is 0. |
| 249 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 250 | The rest of the article is based on *MergeFunctions.cpp* source code |
| 251 | (found in *<llvm_dir>/lib/Transforms/IPO/MergeFunctions.cpp*). We would like |
| 252 | to ask reader to keep this file open, so we could use it as a reference |
| 253 | for further explanations. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 254 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 255 | Now, we're ready to proceed to the next chapter and see how it works. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 256 | |
| 257 | Functions comparison |
| 258 | ==================== |
| 259 | At first, let's define how exactly we compare complex objects. |
| 260 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 261 | Complex object comparison (function, basic-block, etc) is mostly based on its |
| 262 | sub-object comparison results. It is similar to the next "tree" objects |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 263 | comparison: |
| 264 | |
| 265 | #. For two trees *T1* and *T2* we perform *depth-first-traversal* and have |
| 266 | two sequences as a product: "*T1Items*" and "*T2Items*". |
| 267 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 268 | #. We then compare chains "*T1Items*" and "*T2Items*" in |
| 269 | the most-significant-item-first order. The result of items comparison |
| 270 | would be the result of *T1* and *T2* comparison itself. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 271 | |
| 272 | FunctionComparator::compare(void) |
| 273 | --------------------------------- |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 274 | A brief look at the source code tells us that the comparison starts in the |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 275 | “``int FunctionComparator::compare(void)``” method. |
| 276 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 277 | 1. The first parts to be compared are the function's attributes and some |
| 278 | properties that is outside the “attributes” term, but still could make the |
| 279 | function different without changing its body. This part of the comparison is |
| 280 | usually done within simple *cmpNumbers* or *cmpFlags* operations (e.g. |
| 281 | ``cmpFlags(F1->hasGC(), F2->hasGC())``). Below is a full list of function's |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 282 | properties to be compared on this stage: |
| 283 | |
| 284 | * *Attributes* (those are returned by ``Function::getAttributes()`` |
| 285 | method). |
| 286 | |
| 287 | * *GC*, for equivalence, *RHS* and *LHS* should be both either without |
| 288 | *GC* or with the same one. |
| 289 | |
| 290 | * *Section*, just like a *GC*: *RHS* and *LHS* should be defined in the |
| 291 | same section. |
| 292 | |
| 293 | * *Variable arguments*. *LHS* and *RHS* should be both either with or |
| 294 | without *var-args*. |
| 295 | |
| 296 | * *Calling convention* should be the same. |
| 297 | |
| 298 | 2. Function type. Checked by ``FunctionComparator::cmpType(Type*, Type*)`` |
| 299 | method. It checks return type and parameters type; the method itself will be |
| 300 | described later. |
| 301 | |
| 302 | 3. Associate function formal parameters with each other. Then comparing function |
| 303 | bodies, if we see the usage of *LHS*'s *i*-th argument in *LHS*'s body, then, |
| 304 | we want to see usage of *RHS*'s *i*-th argument at the same place in *RHS*'s |
| 305 | body, otherwise functions are different. On this stage we grant the preference |
| 306 | to those we met later in function body (value we met first would be *less*). |
| 307 | This is done by “``FunctionComparator::cmpValues(const Value*, const Value*)``” |
| 308 | method (will be described a bit later). |
| 309 | |
| 310 | 4. Function body comparison. As it written in method comments: |
| 311 | |
| 312 | “We do a CFG-ordered walk since the actual ordering of the blocks in the linked |
| 313 | list is immaterial. Our walk starts at the entry block for both functions, then |
| 314 | takes each block from each terminator in order. As an artifact, this also means |
| 315 | that unreachable blocks are ignored.” |
| 316 | |
| 317 | So, using this walk we get BBs from *left* and *right* in the same order, and |
| 318 | compare them by “``FunctionComparator::compare(const BasicBlock*, const |
| 319 | BasicBlock*)``” method. |
| 320 | |
| 321 | We also associate BBs with each other, like we did it with function formal |
| 322 | arguments (see ``cmpValues`` method below). |
| 323 | |
| 324 | FunctionComparator::cmpType |
| 325 | --------------------------- |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 326 | Consider how type comparison works. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 327 | |
| 328 | 1. Coerce pointer to integer. If left type is a pointer, try to coerce it to the |
| 329 | integer type. It could be done if its address space is 0, or if address spaces |
| 330 | are ignored at all. Do the same thing for the right type. |
| 331 | |
| 332 | 2. If left and right types are equal, return 0. Otherwise we need to give |
| 333 | preference to one of them. So proceed to the next step. |
| 334 | |
| 335 | 3. If types are of different kind (different type IDs). Return result of type |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 336 | IDs comparison, treating them as numbers (use ``cmpNumbers`` operation). |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 337 | |
| 338 | 4. If types are vectors or integers, return result of their pointers comparison, |
| 339 | comparing them as numbers. |
| 340 | |
| 341 | 5. Check whether type ID belongs to the next group (call it equivalent-group): |
| 342 | |
| 343 | * Void |
| 344 | |
| 345 | * Float |
| 346 | |
| 347 | * Double |
| 348 | |
| 349 | * X86_FP80 |
| 350 | |
| 351 | * FP128 |
| 352 | |
| 353 | * PPC_FP128 |
| 354 | |
| 355 | * Label |
| 356 | |
| 357 | * Metadata. |
| 358 | |
| 359 | If ID belongs to group above, return 0. Since it's enough to see that |
| 360 | types has the same ``TypeID``. No additional information is required. |
| 361 | |
| 362 | 6. Left and right are pointers. Return result of address space comparison |
| 363 | (numbers comparison). |
| 364 | |
| 365 | 7. Complex types (structures, arrays, etc.). Follow complex objects comparison |
| 366 | technique (see the very first paragraph of this chapter). Both *left* and |
| 367 | *right* are to be expanded and their element types will be checked the same |
| 368 | way. If we get -1 or 1 on some stage, return it. Otherwise return 0. |
| 369 | |
| 370 | 8. Steps 1-6 describe all the possible cases, if we passed steps 1-6 and didn't |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 371 | get any conclusions, then invoke ``llvm_unreachable``, since it's quite an |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 372 | unexpectable case. |
| 373 | |
| 374 | cmpValues(const Value*, const Value*) |
| 375 | ------------------------------------- |
| 376 | Method that compares local values. |
| 377 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 378 | This method gives us an answer to a very curious question: whether we could |
| 379 | treat local values as equal, and which value is greater otherwise. It's |
| 380 | better to start from example: |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 381 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 382 | Consider the situation when we're looking at the same place in left |
| 383 | function "*FL*" and in right function "*FR*". Every part of *left* place is |
| 384 | equal to the corresponding part of *right* place, and (!) both parts use |
| 385 | *Value* instances, for example: |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 386 | |
Renato Golin | 88ea57f | 2016-07-20 12:16:38 +0000 | [diff] [blame] | 387 | .. code-block:: text |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 388 | |
| 389 | instr0 i32 %LV ; left side, function FL |
| 390 | instr0 i32 %RV ; right side, function FR |
| 391 | |
| 392 | So, now our conclusion depends on *Value* instances comparison. |
| 393 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 394 | The main purpose of this method is to determine relation between such values. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 395 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 396 | What can we expect from equal functions? At the same place, in functions |
| 397 | "*FL*" and "*FR*" we expect to see *equal* values, or values *defined* at |
| 398 | the same place in "*FL*" and "*FR*". |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 399 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 400 | Consider a small example here: |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 401 | |
Renato Golin | 88ea57f | 2016-07-20 12:16:38 +0000 | [diff] [blame] | 402 | .. code-block:: text |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 403 | |
| 404 | define void %f(i32 %pf0, i32 %pf1) { |
| 405 | instr0 i32 %pf0 instr1 i32 %pf1 instr2 i32 123 |
| 406 | } |
| 407 | |
Renato Golin | 88ea57f | 2016-07-20 12:16:38 +0000 | [diff] [blame] | 408 | .. code-block:: text |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 409 | |
| 410 | define void %g(i32 %pg0, i32 %pg1) { |
| 411 | instr0 i32 %pg0 instr1 i32 %pg0 instr2 i32 123 |
| 412 | } |
| 413 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 414 | In this example, *pf0* is associated with *pg0*, *pf1* is associated with |
| 415 | *pg1*, and we also declare that *pf0* < *pf1*, and thus *pg0* < *pf1*. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 416 | |
| 417 | Instructions with opcode "*instr0*" would be *equal*, since their types and |
| 418 | opcodes are equal, and values are *associated*. |
| 419 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 420 | Instructions with opcode "*instr1*" from *f* is *greater* than instructions |
| 421 | with opcode "*instr1*" from *g*; here we have equal types and opcodes, but |
| 422 | "*pf1* is greater than "*pg0*". |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 423 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 424 | Instructions with opcode "*instr2*" are equal, because their opcodes and |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 425 | types are equal, and the same constant is used as a value. |
| 426 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 427 | What we associate in cmpValues? |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 428 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 429 | * Function arguments. *i*-th argument from left function associated with |
| 430 | *i*-th argument from right function. |
| 431 | * BasicBlock instances. In basic-block enumeration loop we associate *i*-th |
| 432 | BasicBlock from the left function with *i*-th BasicBlock from the right |
| 433 | function. |
| 434 | * Instructions. |
| 435 | * Instruction operands. Note, we can meet *Value* here we have never seen |
| 436 | before. In this case it is not a function argument, nor *BasicBlock*, nor |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 437 | *Instruction*. It is a global value. It is a constant, since it's the only |
| 438 | supposed global here. The method also compares: Constants that are of the |
| 439 | same type and if right constant can be losslessly bit-casted to the left |
| 440 | one, then we also compare them. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 441 | |
| 442 | How to implement cmpValues? |
| 443 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 444 | *Association* is a case of equality for us. We just treat such values as equal, |
| 445 | but, in general, we need to implement antisymmetric relation. As mentioned |
| 446 | above, to understand what is *less*, we can use order in which we |
| 447 | meet values. If both values have the same order in a function (met at the same |
| 448 | time), we then treat values as *associated*. Otherwise – it depends on who was |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 449 | first. |
| 450 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 451 | Every time we run the top-level compare method, we initialize two identical |
| 452 | maps (one for the left side, another one for the right side): |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 453 | |
| 454 | ``map<Value, int> sn_mapL, sn_mapR;`` |
| 455 | |
| 456 | The key of the map is the *Value* itself, the *value* – is its order (call it |
| 457 | *serial number*). |
| 458 | |
| 459 | To add value *V* we need to perform the next procedure: |
| 460 | |
| 461 | ``sn_map.insert(std::make_pair(V, sn_map.size()));`` |
| 462 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 463 | For the first *Value*, map will return *0*, for the second *Value* map will |
| 464 | return *1*, and so on. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 465 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 466 | We can then check whether left and right values met at the same time with |
| 467 | a simple comparison: |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 468 | |
| 469 | ``cmpNumbers(sn_mapL[Left], sn_mapR[Right]);`` |
| 470 | |
| 471 | Of course, we can combine insertion and comparison: |
| 472 | |
| 473 | .. code-block:: c++ |
| 474 | |
| 475 | std::pair<iterator, bool> |
| 476 | LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), RightRes |
| 477 | = sn_mapR.insert(std::make_pair(Right, sn_mapR.size())); |
| 478 | return cmpNumbers(LeftRes.first->second, RightRes.first->second); |
| 479 | |
| 480 | Let's look, how whole method could be implemented. |
| 481 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 482 | 1. We have to start with the bad news. Consider function self and |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 483 | cross-referencing cases: |
| 484 | |
| 485 | .. code-block:: c++ |
| 486 | |
| 487 | // self-reference unsigned fact0(unsigned n) { return n > 1 ? n |
| 488 | * fact0(n-1) : 1; } unsigned fact1(unsigned n) { return n > 1 ? n * |
| 489 | fact1(n-1) : 1; } |
| 490 | |
| 491 | // cross-reference unsigned ping(unsigned n) { return n!= 0 ? pong(n-1) : 0; |
| 492 | } unsigned pong(unsigned n) { return n!= 0 ? ping(n-1) : 0; } |
| 493 | |
| 494 | .. |
| 495 | |
| 496 | This comparison has been implemented in initial *MergeFunctions* pass |
| 497 | version. But, unfortunately, it is not transitive. And this is the only case |
| 498 | we can't convert to less-equal-greater comparison. It is a seldom case, 4-5 |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 499 | functions of 10000 (checked in test-suite), and, we hope, the reader would |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 500 | forgive us for such a sacrifice in order to get the O(log(N)) pass time. |
| 501 | |
| 502 | 2. If left/right *Value* is a constant, we have to compare them. Return 0 if it |
| 503 | is the same constant, or use ``cmpConstants`` method otherwise. |
| 504 | |
| 505 | 3. If left/right is *InlineAsm* instance. Return result of *Value* pointers |
| 506 | comparison. |
| 507 | |
| 508 | 4. Explicit association of *L* (left value) and *R* (right value). We need to |
| 509 | find out whether values met at the same time, and thus are *associated*. Or we |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 510 | need to put the rule: when we treat *L* < *R*. Now it is easy: we just return |
| 511 | the result of numbers comparison: |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 512 | |
| 513 | .. code-block:: c++ |
| 514 | |
| 515 | std::pair<iterator, bool> |
| 516 | LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), |
| 517 | RightRes = sn_mapR.insert(std::make_pair(Right, sn_mapR.size())); |
| 518 | if (LeftRes.first->second == RightRes.first->second) return 0; |
| 519 | if (LeftRes.first->second < RightRes.first->second) return -1; |
| 520 | return 1; |
| 521 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 522 | Now when *cmpValues* returns 0, we can proceed the comparison procedure. |
| 523 | Otherwise, if we get (-1 or 1), we need to pass this result to the top level, |
| 524 | and finish comparison procedure. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 525 | |
| 526 | cmpConstants |
| 527 | ------------ |
| 528 | Performs constants comparison as follows: |
| 529 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 530 | 1. Compare constant types using ``cmpType`` method. If the result is -1 or 1, |
| 531 | goto step 2, otherwise proceed to step 3. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 532 | |
| 533 | 2. If types are different, we still can check whether constants could be |
| 534 | losslessly bitcasted to each other. The further explanation is modification of |
| 535 | ``canLosslesslyBitCastTo`` method. |
| 536 | |
| 537 | 2.1 Check whether constants are of the first class types |
| 538 | (``isFirstClassType`` check): |
| 539 | |
| 540 | 2.1.1. If both constants are *not* of the first class type: return result |
| 541 | of ``cmpType``. |
| 542 | |
| 543 | 2.1.2. Otherwise, if left type is not of the first class, return -1. If |
| 544 | right type is not of the first class, return 1. |
| 545 | |
| 546 | 2.1.3. If both types are of the first class type, proceed to the next step |
| 547 | (2.1.3.1). |
| 548 | |
| 549 | 2.1.3.1. If types are vectors, compare their bitwidth using the |
| 550 | *cmpNumbers*. If result is not 0, return it. |
| 551 | |
| 552 | 2.1.3.2. Different types, but not a vectors: |
| 553 | |
| 554 | * if both of them are pointers, good for us, we can proceed to step 3. |
| 555 | * if one of types is pointer, return result of *isPointer* flags |
| 556 | comparison (*cmpFlags* operation). |
| 557 | * otherwise we have no methods to prove bitcastability, and thus return |
| 558 | result of types comparison (-1 or 1). |
| 559 | |
| 560 | Steps below are for the case when types are equal, or case when constants are |
| 561 | bitcastable: |
| 562 | |
| 563 | 3. One of constants is a "*null*" value. Return the result of |
| 564 | ``cmpFlags(L->isNullValue, R->isNullValue)`` comparison. |
| 565 | |
| 566 | 4. Compare value IDs, and return result if it is not 0: |
| 567 | |
| 568 | .. code-block:: c++ |
| 569 | |
| 570 | if (int Res = cmpNumbers(L->getValueID(), R->getValueID())) |
| 571 | return Res; |
| 572 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 573 | 5. Compare the contents of constants. The comparison depends on the kind of |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 574 | constants, but on this stage it is just a lexicographical comparison. Just see |
| 575 | how it was described in the beginning of "*Functions comparison*" paragraph. |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 576 | Mathematically, it is equal to the next case: we encode left constant and right |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 577 | constant (with similar way *bitcode-writer* does). Then compare left code |
| 578 | sequence and right code sequence. |
| 579 | |
| 580 | compare(const BasicBlock*, const BasicBlock*) |
| 581 | --------------------------------------------- |
| 582 | Compares two *BasicBlock* instances. |
| 583 | |
| 584 | It enumerates instructions from left *BB* and right *BB*. |
| 585 | |
| 586 | 1. It assigns serial numbers to the left and right instructions, using |
| 587 | ``cmpValues`` method. |
| 588 | |
| 589 | 2. If one of left or right is *GEP* (``GetElementPtr``), then treat *GEP* as |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 590 | greater than other instructions. If both instructions are *GEPs* use ``cmpGEP`` |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 591 | method for comparison. If result is -1 or 1, pass it to the top-level |
| 592 | comparison (return it). |
| 593 | |
| 594 | 3.1. Compare operations. Call ``cmpOperation`` method. If result is -1 or |
| 595 | 1, return it. |
| 596 | |
| 597 | 3.2. Compare number of operands, if result is -1 or 1, return it. |
| 598 | |
| 599 | 3.3. Compare operands themselves, use ``cmpValues`` method. Return result |
| 600 | if it is -1 or 1. |
| 601 | |
| 602 | 3.4. Compare type of operands, using ``cmpType`` method. Return result if |
| 603 | it is -1 or 1. |
| 604 | |
| 605 | 3.5. Proceed to the next instruction. |
| 606 | |
| 607 | 4. We can finish instruction enumeration in 3 cases: |
| 608 | |
| 609 | 4.1. We reached the end of both left and right basic-blocks. We didn't |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 610 | exit on steps 1-3, so contents are equal, return 0. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 611 | |
| 612 | 4.2. We have reached the end of the left basic-block. Return -1. |
| 613 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 614 | 4.3. Return 1 (we reached the end of the right basic block). |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 615 | |
| 616 | cmpGEP |
| 617 | ------ |
| 618 | Compares two GEPs (``getelementptr`` instructions). |
| 619 | |
| 620 | It differs from regular operations comparison with the only thing: possibility |
| 621 | to use ``accumulateConstantOffset`` method. |
| 622 | |
| 623 | So, if we get constant offset for both left and right *GEPs*, then compare it as |
| 624 | numbers, and return comparison result. |
| 625 | |
| 626 | Otherwise treat it like a regular operation (see previous paragraph). |
| 627 | |
| 628 | cmpOperation |
| 629 | ------------ |
| 630 | Compares instruction opcodes and some important operation properties. |
| 631 | |
| 632 | 1. Compare opcodes, if it differs return the result. |
| 633 | |
| 634 | 2. Compare number of operands. If it differs – return the result. |
| 635 | |
| 636 | 3. Compare operation types, use *cmpType*. All the same – if types are |
| 637 | different, return result. |
| 638 | |
| 639 | 4. Compare *subclassOptionalData*, get it with ``getRawSubclassOptionalData`` |
| 640 | method, and compare it like a numbers. |
| 641 | |
| 642 | 5. Compare operand types. |
| 643 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 644 | 6. For some particular instructions, check equivalence (relation in our case) of |
| 645 | some significant attributes. For example, we have to compare alignment for |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 646 | ``load`` instructions. |
| 647 | |
| 648 | O(log(N)) |
| 649 | --------- |
| 650 | Methods described above implement order relationship. And latter, could be used |
| 651 | for nodes comparison in a binary tree. So we can organize functions set into |
| 652 | the binary tree and reduce the cost of lookup procedure from |
| 653 | O(N*N) to O(log(N)). |
| 654 | |
| 655 | Merging process, mergeTwoFunctions |
| 656 | ================================== |
| 657 | Once *MergeFunctions* detected that current function (*G*) is equal to one that |
| 658 | were analyzed before (function *F*) it calls ``mergeTwoFunctions(Function*, |
| 659 | Function*)``. |
| 660 | |
| 661 | Operation affects ``FnTree`` contents with next way: *F* will stay in |
| 662 | ``FnTree``. *G* being equal to *F* will not be added to ``FnTree``. Calls of |
| 663 | *G* would be replaced with something else. It changes bodies of callers. So, |
| 664 | functions that calls *G* would be put into ``Deferred`` set and removed from |
| 665 | ``FnTree``, and analyzed again. |
| 666 | |
| 667 | The approach is next: |
| 668 | |
| 669 | 1. Most wished case: when we can use alias and both of *F* and *G* are weak. We |
| 670 | make both of them with aliases to the third strong function *H*. Actually *H* |
| 671 | is *F*. See below how it's made (but it's better to look straight into the |
| 672 | source code). Well, this is a case when we can just replace *G* with *F* |
| 673 | everywhere, we use ``replaceAllUsesWith`` operation here (*RAUW*). |
| 674 | |
| 675 | 2. *F* could not be overridden, while *G* could. It would be good to do the |
| 676 | next: after merging the places where overridable function were used, still use |
| 677 | overridable stub. So try to make *G* alias to *F*, or create overridable tail |
| 678 | call wrapper around *F* and replace *G* with that call. |
| 679 | |
| 680 | 3. Neither *F* nor *G* could be overridden. We can't use *RAUW*. We can just |
| 681 | change the callers: call *F* instead of *G*. That's what |
| 682 | ``replaceDirectCallers`` does. |
| 683 | |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 684 | Below is a detailed body description. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 685 | |
| 686 | If “F” may be overridden |
| 687 | ------------------------ |
| 688 | As follows from ``mayBeOverridden`` comments: “whether the definition of this |
Sylvestre Ledru | 3c5ec72 | 2016-02-14 20:16:22 +0000 | [diff] [blame] | 689 | global may be replaced by something non-equivalent at link time”. If so, that's |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 690 | ok: we can use alias to *F* instead of *G* or change call instructions itself. |
| 691 | |
| 692 | HasGlobalAliases, removeUsers |
| 693 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 694 | First consider the case when we have global aliases of one function name to |
| 695 | another. Our purpose is make both of them with aliases to the third strong |
| 696 | function. Though if we keep *F* alive and without major changes we can leave it |
| 697 | in ``FnTree``. Try to combine these two goals. |
| 698 | |
| 699 | Do stub replacement of *F* itself with an alias to *F*. |
| 700 | |
| 701 | 1. Create stub function *H*, with the same name and attributes like function |
| 702 | *F*. It takes maximum alignment of *F* and *G*. |
| 703 | |
| 704 | 2. Replace all uses of function *F* with uses of function *H*. It is the two |
| 705 | steps procedure instead. First of all, we must take into account, all functions |
| 706 | from whom *F* is called would be changed: since we change the call argument |
| 707 | (from *F* to *H*). If so we must to review these caller functions again after |
| 708 | this procedure. We remove callers from ``FnTree``, method with name |
| 709 | ``removeUsers(F)`` does that (don't confuse with ``replaceAllUsesWith``): |
| 710 | |
| 711 | 2.1. ``Inside removeUsers(Value* |
| 712 | V)`` we go through the all values that use value *V* (or *F* in our context). |
| 713 | If value is instruction, we go to function that holds this instruction and |
| 714 | mark it as to-be-analyzed-again (put to ``Deferred`` set), we also remove |
| 715 | caller from ``FnTree``. |
| 716 | |
| 717 | 2.2. Now we can do the replacement: call ``F->replaceAllUsesWith(H)``. |
| 718 | |
| 719 | 3. *H* (that now "officially" plays *F*'s role) is replaced with alias to *F*. |
| 720 | Do the same with *G*: replace it with alias to *F*. So finally everywhere *F* |
| 721 | was used, we use *H* and it is alias to *F*, and everywhere *G* was used we |
| 722 | also have alias to *F*. |
| 723 | |
| 724 | 4. Set *F* linkage to private. Make it strong :-) |
| 725 | |
| 726 | No global aliases, replaceDirectCallers |
| 727 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 728 | If global aliases are not supported. We call ``replaceDirectCallers``. Just |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 729 | go through all calls of *G* and replace it with calls of *F*. If you look into |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 730 | the method you will see that it scans all uses of *G* too, and if use is callee |
| 731 | (if user is call instruction and *G* is used as what to be called), we replace |
| 732 | it with use of *F*. |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 733 | |
| 734 | If “F” could not be overridden, fix it! |
| 735 | """"""""""""""""""""""""""""""""""""""" |
| 736 | |
| 737 | We call ``writeThunkOrAlias(Function *F, Function *G)``. Here we try to replace |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 738 | *G* with alias to *F* first. The next conditions are essential: |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 739 | |
| 740 | * target should support global aliases, |
| 741 | * the address itself of *G* should be not significant, not named and not |
| 742 | referenced anywhere, |
| 743 | * function should come with external, local or weak linkage. |
| 744 | |
| 745 | Otherwise we write thunk: some wrapper that has *G's* interface and calls *F*, |
| 746 | so *G* could be replaced with this wrapper. |
| 747 | |
| 748 | *writeAlias* |
| 749 | |
| 750 | As follows from *llvm* reference: |
| 751 | |
| 752 | “Aliases act as *second name* for the aliasee value”. So we just want to create |
Aditya Kumar | a83fad8 | 2018-08-18 20:17:19 +0000 | [diff] [blame] | 753 | a second name for *F* and use it instead of *G*: |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 754 | |
| 755 | 1. create global alias itself (*GA*), |
| 756 | |
| 757 | 2. adjust alignment of *F* so it must be maximum of current and *G's* alignment; |
| 758 | |
| 759 | 3. replace uses of *G*: |
| 760 | |
| 761 | 3.1. first mark all callers of *G* as to-be-analyzed-again, using |
| 762 | ``removeUsers`` method (see chapter above), |
| 763 | |
| 764 | 3.2. call ``G->replaceAllUsesWith(GA)``. |
| 765 | |
| 766 | 4. Get rid of *G*. |
| 767 | |
| 768 | *writeThunk* |
| 769 | |
| 770 | As it written in method comments: |
| 771 | |
| 772 | “Replace G with a simple tail call to bitcast(F). Also replace direct uses of G |
| 773 | with bitcast(F). Deletes G.” |
| 774 | |
| 775 | In general it does the same as usual when we want to replace callee, except the |
| 776 | first point: |
| 777 | |
| 778 | 1. We generate tail call wrapper around *F*, but with interface that allows use |
| 779 | it instead of *G*. |
| 780 | |
| 781 | 2. “As-usual”: ``removeUsers`` and ``replaceAllUsesWith`` then. |
| 782 | |
| 783 | 3. Get rid of *G*. |
| 784 | |
Stepan Dyatkovskiy | 4912640 | 2014-12-10 17:42:01 +0000 | [diff] [blame] | 785 | |