| # Debug Info Module |
| |
| This module serves the purpose of generating debug symbols. We use LLVM's |
| [source level debugging](https://llvm.org/docs/SourceLevelDebugging.html) |
| features for generating the debug information. The general principle is |
| this: |
| |
| Given the right metadata in the LLVM IR, the LLVM code generator is able to |
| create DWARF debug symbols for the given code. The |
| [metadata](https://llvm.org/docs/LangRef.html#metadata-type) is structured |
| much like DWARF *debugging information entries* (DIE), representing type |
| information such as datatype layout, function signatures, block layout, |
| variable location and scope information, etc. It is the purpose of this |
| module to generate correct metadata and insert it into the LLVM IR. |
| |
| As the exact format of metadata trees may change between different LLVM |
| versions, we now use LLVM |
| [DIBuilder](https://llvm.org/docs/doxygen/html/classllvm_1_1DIBuilder.html) |
| to create metadata where possible. This will hopefully ease the adaption of |
| this module to future LLVM versions. |
| |
| The public API of the module is a set of functions that will insert the |
| correct metadata into the LLVM IR when called with the right parameters. |
| The module is thus driven from an outside client with functions like |
| `debuginfo::create_local_var_metadata(bx: block, local: &ast::local)`. |
| |
| Internally the module will try to reuse already created metadata by |
| utilizing a cache. The way to get a shared metadata node when needed is |
| thus to just call the corresponding function in this module: |
| |
| let file_metadata = file_metadata(cx, file); |
| |
| The function will take care of probing the cache for an existing node for |
| that exact file path. |
| |
| All private state used by the module is stored within either the |
| CrateDebugContext struct (owned by the CodegenCx) or the |
| FunctionDebugContext (owned by the FunctionCx). |
| |
| This file consists of three conceptual sections: |
| 1. The public interface of the module |
| 2. Module-internal metadata creation functions |
| 3. Minor utility functions |
| |
| |
| ## Recursive Types |
| |
| Some kinds of types, such as structs and enums can be recursive. That means |
| that the type definition of some type X refers to some other type which in |
| turn (transitively) refers to X. This introduces cycles into the type |
| referral graph. A naive algorithm doing an on-demand, depth-first traversal |
| of this graph when describing types, can get trapped in an endless loop |
| when it reaches such a cycle. |
| |
| For example, the following simple type for a singly-linked list... |
| |
| ``` |
| struct List { |
| value: i32, |
| tail: Option<Box<List>>, |
| } |
| ``` |
| |
| will generate the following callstack with a naive DFS algorithm: |
| |
| ``` |
| describe(t = List) |
| describe(t = i32) |
| describe(t = Option<Box<List>>) |
| describe(t = Box<List>) |
| describe(t = List) // at the beginning again... |
| ... |
| ``` |
| |
| To break cycles like these, we use "forward declarations". That is, when |
| the algorithm encounters a possibly recursive type (any struct or enum), it |
| immediately creates a type description node and inserts it into the cache |
| *before* describing the members of the type. This type description is just |
| a stub (as type members are not described and added to it yet) but it |
| allows the algorithm to already refer to the type. After the stub is |
| inserted into the cache, the algorithm continues as before. If it now |
| encounters a recursive reference, it will hit the cache and does not try to |
| describe the type anew. |
| |
| This behavior is encapsulated in the 'RecursiveTypeDescription' enum, |
| which represents a kind of continuation, storing all state needed to |
| continue traversal at the type members after the type has been registered |
| with the cache. (This implementation approach might be a tad over- |
| engineered and may change in the future) |
| |
| |
| ## Source Locations and Line Information |
| |
| In addition to data type descriptions the debugging information must also |
| allow to map machine code locations back to source code locations in order |
| to be useful. This functionality is also handled in this module. The |
| following functions allow to control source mappings: |
| |
| + `set_source_location()` |
| + `clear_source_location()` |
| + `start_emitting_source_locations()` |
| |
| `set_source_location()` allows to set the current source location. All IR |
| instructions created after a call to this function will be linked to the |
| given source location, until another location is specified with |
| `set_source_location()` or the source location is cleared with |
| `clear_source_location()`. In the later case, subsequent IR instruction |
| will not be linked to any source location. As you can see, this is a |
| stateful API (mimicking the one in LLVM), so be careful with source |
| locations set by previous calls. It's probably best to not rely on any |
| specific state being present at a given point in code. |
| |
| One topic that deserves some extra attention is *function prologues*. At |
| the beginning of a function's machine code there are typically a few |
| instructions for loading argument values into allocas and checking if |
| there's enough stack space for the function to execute. This *prologue* is |
| not visible in the source code and LLVM puts a special PROLOGUE END marker |
| into the line table at the first non-prologue instruction of the function. |
| In order to find out where the prologue ends, LLVM looks for the first |
| instruction in the function body that is linked to a source location. So, |
| when generating prologue instructions we have to make sure that we don't |
| emit source location information until the 'real' function body begins. For |
| this reason, source location emission is disabled by default for any new |
| function being codegened and is only activated after a call to the third |
| function from the list above, `start_emitting_source_locations()`. This |
| function should be called right before regularly starting to codegen the |
| top-level block of the given function. |
| |
| There is one exception to the above rule: `llvm.dbg.declare` instruction |
| must be linked to the source location of the variable being declared. For |
| function parameters these `llvm.dbg.declare` instructions typically occur |
| in the middle of the prologue, however, they are ignored by LLVM's prologue |
| detection. The `create_argument_metadata()` and related functions take care |
| of linking the `llvm.dbg.declare` instructions to the correct source |
| locations even while source location emission is still disabled, so there |
| is no need to do anything special with source location handling here. |
| |
| ## Unique Type Identification |
| |
| In order for link-time optimization to work properly, LLVM needs a unique |
| type identifier that tells it across compilation units which types are the |
| same as others. This type identifier is created by |
| `TypeMap::get_unique_type_id_of_type()` using the following algorithm: |
| |
| 1. Primitive types have their name as ID |
| |
| 2. Structs, enums and traits have a multipart identifier |
| |
| 1. The first part is the SVH (strict version hash) of the crate they |
| were originally defined in |
| |
| 2. The second part is the ast::NodeId of the definition in their |
| original crate |
| |
| 3. The final part is a concatenation of the type IDs of their concrete |
| type arguments if they are generic types. |
| |
| 3. Tuple-, pointer-, and function types are structurally identified, which |
| means that they are equivalent if their component types are equivalent |
| (i.e., `(i32, i32)` is the same regardless in which crate it is used). |
| |
| This algorithm also provides a stable ID for types that are defined in one |
| crate but instantiated from metadata within another crate. We just have to |
| take care to always map crate and `NodeId`s back to the original crate |
| context. |
| |
| As a side-effect these unique type IDs also help to solve a problem arising |
| from lifetime parameters. Since lifetime parameters are completely omitted |
| in debuginfo, more than one `Ty` instance may map to the same debuginfo |
| type metadata, that is, some struct `Struct<'a>` may have N instantiations |
| with different concrete substitutions for `'a`, and thus there will be N |
| `Ty` instances for the type `Struct<'a>` even though it is not generic |
| otherwise. Unfortunately this means that we cannot use `ty::type_id()` as |
| cheap identifier for type metadata -- we have done this in the past, but it |
| led to unnecessary metadata duplication in the best case and LLVM |
| assertions in the worst. However, the unique type ID as described above |
| *can* be used as identifier. Since it is comparatively expensive to |
| construct, though, `ty::type_id()` is still used additionally as an |
| optimization for cases where the exact same type has been seen before |
| (which is most of the time). |