| # Debug Generation |
| |
| Application developers spend a significant time debugging the applications that |
| they create. Hence it is important that a compiler provide support for a good |
| debug experience. DWARF[1] is the standard debugging file format used by |
| compilers and debuggers. The LLVM infrastructure supports debug info generation |
| using metadata[2]. Support for generating debug metadata is present |
| in MLIR by way of MLIR attributes. Flang can leverage these MLIR attributes to |
| generate good debug information. |
| |
| We can break the work for debug generation into two separate tasks: |
| 1) Line Table generation |
| 2) Full debug generation |
| The support for Fortran Debug in LLVM infrastructure[3] has made great progress |
| due to many Fortran frontends adopting LLVM as the backend as well as the |
| availability of the Classic Flang compiler. |
| |
| ## Driver Flags |
| By default, Flang will not generate any debug or linetable information. |
| Debug information will be generated if the following flags are present. |
| |
| -gline-tables-only, -g1 : Emit debug line number tables only |
| -g : Emit full debug info |
| |
| ## Line Table Generation |
| |
| There is existing AddDebugFoundationPass which add `FusedLoc` with a |
| `SubprogramAttr` on FuncOp. This allows MLIR to generate LLVM IR metadata |
| for that function. However, following values are hardcoded at the moment. These |
| will instead be passed from the driver. |
| |
| - Details of the compiler (name and version and git hash). |
| - Language Standard. We can set it to Fortran95 for now and periodically |
| revise it when full support for later standards is available. |
| - Optimisation Level. |
| - Type of debug generated (linetable/full debug). |
| - Calling Convention: `DW_CC_normal` by default and `DW_CC_program` if it is |
| the main program. |
| |
| `DISubroutineTypeAttr` currently has a fixed type. This will be changed to |
| match the signature of the actual function/subroutine. |
| |
| |
| ## Full Debug Generation |
| |
| Full debug info will include metadata to describe functions, variables and |
| types. Flang will generate debug metadata in the form of MLIR attributes. These |
| attributes will be converted to the format expected by LLVM IR in DebugTranslation[4]. |
| |
| Debug metadata generation can be broken down in 2 steps. |
| |
| 1. MLIR attributes are generated by reading information from AST or FIR. This |
| step can happen anytime before or during conversion to LLVM dialect. An example |
| of the metadata generated in this step is `DILocalVariableAttr` or |
| `DIDerivedTypeAttr`. |
| |
| 2. Changes that can only happen during or after conversion to LLVM dialect. The |
| example of this is passing `DIGlobalVariableExpressionAttr` while |
| creating `LLVM::GlobalOp`. Another example will be generation of `DbgDeclareOp` |
| that is required for local variables. It can only be created after conversion to |
| LLVM dialect as it requires LLVM.Ptr type. The changes required for step 2 are |
| quite minimal. The bulk of the work happens in step 1. |
| |
| One design decision that we need to make is to decide where to perform step 1. |
| Here are some possible options: |
| |
| **During conversion to LLVM dialect** |
| |
| Pros: |
| 1. Do step 1 and 2 in one place. |
| 2. No chance of missing any change introduced by an earlier transformation. |
| |
| Cons: |
| 1. Passing a lot of information from the driver as discussed in the line table |
| section above may muddle interface of FIRToLLVMConversion. |
| 2. `DeclareOp` is removed before this pass. |
| 3. Even if `DeclareOp` is retained, creating debug metadata while some ops have |
| been converted to LLVMdialect and others are not may cause its own issues. We |
| have to walk the ops chain to extract the information which may be problematic |
| in this case. |
| 4. Some source information is lost by this point. Examples include |
| information about namelists, source line information about field of derived |
| types etc. |
| |
| **During a pass before conversion to LLVM dialect** |
| |
| This is similar to what AddDebugFoundationPass is currently doing. |
| |
| Pros: |
| 1. One central location dedicated to debug information processing. This can |
| result in a cleaner implementation. |
| 2. Similar to above, less chance of missing any change introduced by an earlier |
| transformation. |
| |
| Cons: |
| 1. Step 2 still need to happen during conversion to LLVM dialect. But |
| changes required for step 2 are quite minimal. |
| 2. Similar to above, some source information may be lost by this point. |
| |
| **During Lowering from AST** |
| |
| Pros |
| 1. We have better source information. |
| |
| Cons: |
| 1. There may be change in the code after lowering which may not be |
| reflected in debug information. |
| 2. Comments on an earlier PR [5] advised against this approach. |
| |
| ## Design |
| |
| The design below assumes that we are extracting the information from FIR. |
| If we generate debug metadata during lowering then the description below |
| may need to change. Although the generated metadata remains the same in |
| both cases. |
| |
| The AddDebugFoundationPass will be renamed to AddDebugInfo Pass. The |
| information mentioned in the line info section above will be passed to it from |
| the driver. This pass will run quite late in the pipeline but before |
| `DeclareOp` is removed. |
| |
| In this pass, we will iterate through the `GlobalOp`, `TypeInfoOp`, `FuncOp` |
| and `DeclareOp` to extract the source information and build the MLIR |
| attributes. A class will be added to handle conversion of MLIR and FIR types to |
| `DITypeAttr`. |
| |
| Following sections provide details of how various language constructs will be |
| handled. In these sections, the LLVM IR metadata and MLIR attributes have been |
| used interchangeably. As an example, `DILocalVariableAttr` is an MLIR attribute |
| which gets translated to LLVM IR's `DILocalVariable`. |
| |
| ### Variables |
| |
| #### Local Variables |
| In MLIR, local variables are represented by `DILocalVariableAttr` which |
| stores information like source location and type. They also require a |
| `DbgDeclareOp` which binds `DILocalVariableAttr` with a location. |
| |
| In FIR, `DeclareOp` has source information about the variable. The |
| `DeclareOp` will be processed to create `DILocalVariableAttr`. This attr is |
| attached to the memref op of the `DeclareOp` using a `FusedLoc` approach. |
| |
| During conversion to LLVM dialect, when an op is encountered that has a |
| `DILocalVariableAttr` in its `FusedLoc`, a `DbgDeclareOp` is created which |
| binds the attr with its location. |
| |
| The change in the IR look like as follows: |
| |
| ``` |
| original fir |
| %2 = fir.alloca i32 loc(#loc4) |
| %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"} |
| |
| Fir with FusedLoc. |
| |
| %2 = fir.alloca i32 loc(#loc38) |
| %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"} |
| #di_local_variable5 = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ... > |
| #loc38 = loc(fused<#di_local_variable5>[#loc4]) |
| |
| After conversion to llvm dialect |
| |
| #di_local_variable = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ...> |
| %1 = llvm.alloca %0 x i64 |
| llvm.intr.dbg.declare #di_local_variable = %1 |
| ``` |
| |
| #### Function Arguments |
| |
| Arguments work in similar way, but they present a difficulty that `DeclareOp`'s |
| memref points to `BlockArgument`. Unlike the op in local variable case, |
| the `BlockArgument` are not handled by the FIRToLLVMLowering. This can easily |
| be handled by adding after conversion to LLVM dialect either in FIRToLLVMLowering |
| or in a separate pass. |
| |
| ### Module |
| |
| In debug metadata, the Fortran module will be represented by `DIModuleAttr`. |
| The variables or functions inside module will have scope pointing to the parent module. |
| |
| ``` |
| module helper |
| real glr |
| ... |
| end module helper |
| |
| !1 = !DICompileUnit(language: DW_LANG_Fortran90 ...) |
| !2 = !DIModule(scope: !1, name: "helper" ...) |
| !3 = !DIGlobalVariable(scope: !2, name: "glr" ...) |
| |
| Use of a module results in the following metadata. |
| !4 = !DIImportedEntity(tag: DW_TAG_imported_module, entity: !2) |
| ``` |
| |
| Modules are not first class entities in the FIR. So there is no way to get |
| the location where they are declared in source file. |
| |
| But the information that a variable or function is part of a module |
| can be extracted from its mangled name along with name of the module. There is |
| a `GlobalOp` generated for each module variable in FIR and there is also a |
| `DeclareOp` in each function where the module variable is used. |
| |
| We will use the `GlobalOp` to generate the `DIModuleAttr` and associated |
| `DIGlobalVariableAttr`. A `DeclareOp` for module variable will be used |
| to generate `DIImportedEntityAttr`. Care will be taken to avoid generating |
| duplicate `DIImportedEntityAttr` entries in same function. |
| |
| ### Derived Types |
| |
| A derived type will be represented in metadata by `DICompositeType` with a tag of |
| `DW_TAG_structure_type`. It will have elements which point to the components. |
| |
| ``` |
| type :: t_pair |
| integer :: i |
| real :: x |
| end type |
| !1 = !DICompositeType(tag: DW_TAG_structure_type, name: "t_pair", elements: !2 ...) |
| !2 = !{!3, !4} |
| !3 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "i", size: 32, offset: 0, baseType: !5 ...) |
| !4 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "x", size: 32, offset: 32, baseType: !6 ...) |
| !5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...) |
| !6 = !DIBasicType(tag: DW_TAG_base_type, name: "real" ...) |
| ``` |
| |
| In FIR, `RecordType` and `TypeInfoOp` can be used to get information about the |
| location of the derived type and the types of its components. We may also use |
| `FusedLoc` on `TypeInfoOp` to encode location information for all the components |
| of the derived type. |
| |
| ### CommonBlocks |
| |
| A common block will be represented in metadata by `DICommonBlockAttr` which |
| will be used as scope by the variable inside common block. `DIExpression` |
| can be used to give the offset of any given variable inside the global storage |
| for common block. |
| |
| ``` |
| integer a, b |
| common /test/ a, b |
| |
| ;@test_ = common global [8 x i8] zeroinitializer, !dbg !5, !dbg !6 |
| !1 = !DISubprogram() |
| !2 = !DICommonBlock(scope: !1, name: "test" ...) |
| !3 = !DIGlobalVariable(scope: !2, name: "a" ...) |
| !4 = !DIExpression() |
| !5 = !DIGlobalVariableExpression(var: !3, expr: !4) |
| !6 = !DIGlobalVariable(scope: !2, name: "b" ...) |
| !7 = !DIExpression(DW_OP_plus_uconst, 4) |
| !8 = !DIGlobalVariableExpression(var: !6, expr: !7) |
| ``` |
| |
| In FIR, a common block results in a `GlobalOp` with common linkage. Every |
| function where the common block is used has `DeclareOp` for that variable. |
| This `DeclareOp` will point to global storage through |
| `CoordinateOp` and `AddrOfOp`. The `CoordinateOp` has the offset of the |
| location of this variable in global storage. There is enough information to |
| generate the required metadata. Although it requires walking up the chain from |
| `DeclaredOp` to locate `CoordinateOp` and `AddrOfOp`. |
| |
| ### Arrays |
| |
| The type of fixed size array is represented using `DICompositeType`. The |
| `DISubrangeAttr` is used to provide bounds in any given dimensions. |
| |
| ``` |
| integer abc(4,5) |
| |
| !1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !5, elements: !2 ...) |
| !2 = !{ !3, !4 } |
| !3 = !DISubrange(lowerBound: 1, upperBound: 4 ...) |
| !4 = !DISubrange(lowerBound: 1, upperBound: 5 ...) |
| !5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...) |
| |
| ``` |
| |
| #### Adjustable |
| |
| The debug metadata for the adjustable array looks similar to fixed sized array |
| with one change. The bounds are not constant values but point to a |
| `DILocalVariableAttr`. |
| |
| In FIR, the `DeclareOp` points to a `ShapeOp` and we can walk the chain |
| to get the value that represents the array bound in any dimension. We will |
| create a `DILocalVariableAttr` that will point to that location. This |
| variable will be used in the `DISubrangeAttr`. Note that this |
| `DILocalVariableAttr` does not correspond to any source variable. |
| |
| #### Assumed Size |
| |
| This is treated as raw array. Debug information will not provide any upper bound |
| information for the last dimension. |
| |
| #### Assumed Shape |
| The assumed shape array will use the similar representation as fixed size |
| array but there will be 2 differences. |
| |
| 1. There will be a `datalocation` field which will be an expression. This will |
| enable debugger to get the data pointer from array descriptor. |
| |
| 2. The field in `DISubrangeAttr` for array bounds will be expression which will |
| allow the debugger to get the bounds from descriptor. |
| |
| ``` |
| integer(4), intent(out) :: a(:,:) |
| |
| !1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !8, elements: !2, dataLocation: !3) |
| !2 = !{!5, !7} |
| !3 = !DIExpression(DW_OP_push_object_address, DW_OP_deref) |
| !4 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 32, DW_OP_deref) |
| !5 = !DISubrange(lowerBound: !1, upperBound: !4 ...) |
| !6 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 56, DW_OP_deref) |
| !7 = !DISubrange(lowerBound: !1, upperBound: !6, ...) |
| !8 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...) |
| ``` |
| |
| In assumed shape case, the rank can be determined from the FIR's `SequenceType`. |
| This allows us to generate a `DISubrangeAttr` in each dimension. |
| |
| #### Assumed Rank |
| |
| This is currently unsupported in flang. Its representation will be similar to |
| array representation for assumed shape array with the following difference. |
| |
| 1. `DICompositeTypeAttr` will have a rank field which will be an expression. |
| It will be used to get the rank value from descriptor. |
| 2. Instead of `DISubrangeType` for each dimension, there will be a single |
| `DIGenericSubrange` which will allow debuggers to calculate bounds in any |
| dimension. |
| |
| ### Pointers and Allocatables |
| The pointer and allocatable will be represented using `DICompositeTypeAttr`. It |
| is quirk of DWARF that scalar allocatable or pointer variables will show up in |
| the debug info as pointer to scalar while array pointer or allocatable |
| variables show up as arrays. The behavior is same in gfortran and classic flang. |
| |
| ``` |
| integer, allocatable :: ar(:) |
| integer, pointer :: sc |
| |
| !1 = !DILocalVariable(name: "sc", type: !2) |
| !2 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !3, associated: !9 ...) |
| !3 = !DIBasicType(tag: DW_TAG_base_type, name: "integer", ...) |
| !4 = !DILocalVariable(name: "ar", type: !5 ...) |
| !5 = !DICompositeType(tag: DW_TAG_array_type, baseType: !3, elements: !6, dataLocation: !8, allocated: !9) |
| !6 = !{!7} |
| !7 = !DISubrange(lowerBound: !10, upperBound: !11 ...) |
| !8 = !DIExpression(DW_OP_push_object_address, DW_OP_deref) |
| !9 = !DIExpression(DW_OP_push_object_address, DW_OP_deref, DW_OP_lit0, DW_OP_ne) |
| !10 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 24, DW_OP_deref) |
| !11 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 32, DW_OP_deref) |
| |
| ``` |
| |
| IN FIR, these variable are represent as <!fir.box<!fir.heap<>> or |
| fir.box<!fir.ptr<>>. There is also `allocatable` or `pointer` attribute on |
| the `DeclareOp`. This allows us to generate allocated/associated status of |
| these variables. The metadata to get the information from the descriptor is |
| similar to arrays. |
| |
| ### Strings |
| |
| The `DIStringTypeAttr` can represent both fixed size and allocatable strings. For |
| the allocatable case, the `stringLengthExpression` and `stringLocationExpression` |
| are used to provide the length and the location of the string respectively. |
| |
| ``` |
| character(len=:), allocatable :: var |
| character(len=20) :: fixed |
| |
| !1 = !DILocalVariable(name: "var", type: !2) |
| !2 = !DIStringType(name: "character(*)", stringLengthExpression: !4, stringLocationExpression: !3 ...) |
| !3 = !DIExpression(DW_OP_push_object_address, DW_OP_deref) |
| !4 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 8) |
| |
| !5 = !DILocalVariable(name: "fixed", type: !6) |
| !6 = !DIStringType(name: "character (20)", size: 160) |
| |
| ``` |
| |
| ### Association |
| |
| They will be treated like normal variables. Although we may require to handle |
| the case where the `DeclareOp` of one variable points to the `DeclareOp` of |
| another variable (e.g. a => b). |
| |
| ### Namelists |
| |
| FIR does not seem to have a way to extract information about namelists. |
| |
| ``` |
| namelist /abc/ x3, y3 |
| |
| (gdb) p abc |
| $1 = ( x3 = 100, y3 = 500 ) |
| (gdb) p x3 |
| $2 = 100 |
| (gdb) p y3 |
| $3 = 500 |
| ``` |
| |
| Even without namelist support, we should be able to see the value of the |
| individual variables like `x3` and `y3` in the above example. But we would not |
| be able to evaluate the namelist and have the debugger prints the value of all |
| the variables in it as shown above for `abc`. |
| |
| ## Missing metadata in MLIR |
| |
| Some metadata types that are needed for fortran are present in LLVM IR but are |
| absent from MLIR. A non comprehensive list is given below. |
| |
| 1. `DICommonBlockAttr` |
| 2. `DIGenericSubrangeAttr` |
| 3. `DISubrangeAttr` in MLIR takes IntegerAttr at the moment so only works |
| with fixed sizes arrays. It needs to also accept `DIExpressionAttr` or |
| `DILocalVariableAttr` to support assumed shape and adjustable arrays. |
| 4. The `DICompositeTypeAttr` will need to have field for `datalocation`, |
| `rank`, `allocated` and `associated`. |
| 5. `DIStringTypeAttr` |
| |
| # Testing |
| |
| - LLVM LIT tests will be added to test: |
| - the driver and ensure that it passes the line table and full debug |
| info generation appropriately. |
| - that the pass works as expected and generates debug info. Test will be |
| with `fir-opt`. |
| - with `flang -fc1` that end-to-end debug info generation works. |
| - Manual external tests will be written to ensure that the following works |
| in debug tools |
| - Break at lines. |
| - Break at functions. |
| - print type (ptype) of function names. |
| - print values and types (ptype) of various type of variables |
| - Manually run `GDB`'s gdb.fortran testsuite with llvm-flang. |
| |
| # Resources |
| - [1] https://dwarfstd.org/doc/DWARF5.pdf |
| - [2] https://llvm.org/docs/LangRef.html#metadata |
| - [3] https://archive.fosdem.org/2022/schedule/event/llvm_fortran_debug/ |
| - [4] https://github.com/llvm/llvm-project/blob/main/mlir/lib/Target/LLVMIR/DebugTranslation.cpp |
| - [5] https://github.com/llvm/llvm-project/pull/84202 |