| # Stack arrays pass |
| ## Problem Description |
| In gfortran, `-fstack-arrays` will cause all local arrays, including those of |
| unknown size, to be allocated from stack memory. Gfortran enables this flag by |
| default at `-Ofast`. |
| |
| Flang already allocates all local arrays on the stack (the |
| `memory-allocation-opt` pass can move large allocations to the heap, but by |
| default it will not). But there are some cases where temporary arrays are |
| created on the heap by Flang. Moving these temporary arrays is the goal here. |
| |
| ## Proposed Solution |
| One approach would be to write a pass which attempts to move heap allocations to |
| the stack (like the `memory-allocation-opt` pass in reverse). This approach has |
| the advantage of a self-contained implementation and ensuring that newly added |
| cases are covered. |
| |
| Another approach would be to modify each place in which arrays are allocated on |
| the heap to instead generate a stack allocation. The advantage of the second |
| approach is that it would be difficult to match up all `fir.allocmem` operations |
| with their associated `fir.freemem`. In general, the lifetimes of heap allocated |
| memory are not constrained by the current function's stack frame and so cannot |
| be always converted to stack allocations. It is much easier to swap heap |
| allocations for stack allocations when they are first generated because the |
| lifetime information is conveniently available. |
| |
| For example, to rewrite the heap allocation in the `array-value-copy` pass with |
| a stack allocation using the first approach would require analysis to ensure |
| that the heap allocation is always freed before the function returns. This is |
| much more complex than never generating a heap allocation (and free) in the |
| first place (the second approach). |
| |
| The plan is to take the more complex first approach so that newly added changes |
| to lowering code do not need to be made to support the stack arrays option. The |
| general problem of determining heap allocation lifetimes can be simplified in |
| this case because only those allocations which are always freed before exit from |
| the same function are possible to be moved to heap allocations in that |
| function's stack frame. Furthermore, the aim of this pass would be to only move |
| those allocations generated by Flang, and so some of the more difficult cases |
| can be avoided. Where the transformation is not possible, the existing heap |
| allocation will be kept. |
| |
| ## Implementation details overview |
| In order to limit the complexity of the proposed pass, it is useful to |
| understand the situations in which Flang will generate heap allocations. |
| |
| ### Known Heap Array Allocations |
| Flang allocates most arrays on the stack by default, but there are a few cases |
| where temporary arrays are allocated on the heap: |
| - `flang/lib/Optimizer/Transforms/ArrayValueCopy.cpp` |
| - `flang/lib/Optimizer/Transforms/MemoryAllocation.cpp` |
| - `flang/lib/Lower/ConvertExpr.cpp` |
| - `flang/lib/Lower/IntrinsicCall.cpp` |
| - `flang/lib/Lower/ConvertVariable.cpp` |
| |
| Lowering code is being updated and in the future, temporaries for expressions |
| will be created in the HLFIR bufferization pass in |
| `flang/lib/Optimizer/HLFIR/Trnasforms/BufferizeHLFIR.cpp`. |
| |
| #### `ArrayValueCopy.cpp` |
| Memory is allocated for a temporary array in `allocateArrayTemp()`. This |
| temporary array is used to ensure that assignments of one array to itself |
| produce the required value. E.g. |
| |
| ``` |
| integer, dimension(5), intent(inout) :: x |
| x(3,4) = x(1,2) |
| ``` |
| |
| #### `MemoryAllocation.cpp` |
| The default options for the Memory Allocation transformation ensure that no |
| array allocations, no matter how large, are moved from the stack to the heap. |
| |
| #### `ConvertExpr.cpp` |
| `ConvertExpr.cpp` allocates many array temporaries on the heap: |
| - Passing array arguments by value or when they need re-shaping |
| - Lowering elemental array expressions |
| - Lowering mask expressions |
| - Array constructors |
| |
| The last two of these cases are **not** covered by the current stack arrays pass |
| design. |
| |
| The FIR code generated for mask expressions (the WHERE construct) sets a |
| boolean variable to indicate whether a heap allocation was necessary. The |
| allocation is only freed if the variable indicates that the allocation was |
| performed to begin with. The proposed dataflow analysis is not intelligent |
| enough to statically determine that the boolean variable will always be true |
| when the allocation is performed. Beyond this, the control flow in the generated |
| FIR code passes the allocated memory through `fir.result`, resulting in a |
| different SSA value to be allocated and freed, causing the analysis not to |
| realise that the allocated memory is freed. The most convenient solution here |
| would be to generate less complicated FIR code, as the existing codegen has |
| known bugs: https://github.com/llvm/llvm-project/issues/56921, |
| https://github.com/llvm/llvm-project/issues/59803. |
| |
| Code generated for array constructors uses `realloc()` to grow the allocated |
| buffer because the size of the resulting array cannot always be determined |
| ahead of running the constructor. This makes this temporary unsuitable |
| for allocation on the stack. |
| |
| #### `IntrinsicCall.cpp` |
| The existing design is for the runtime to do the allocation and the lowering |
| code to insert `fir.freemem` to remove the allocation. It is not clear whether |
| this can be replaced by adapting lowering code to do stack allocations and to |
| pass these to the runtime. This would be a significant change and so is out of |
| scope of `-fstack-arrays`. |
| |
| #### `ConvertVariable.cpp` |
| See `Fortran::lower::MapSymbolAttributes`: |
| |
| Dummy arguments that are not declared in the current entry point require a |
| skeleton definition. Most such "unused" dummies will not survive into final |
| generated code, but some will. It is illegal to reference one at run time if it |
| does. There are three ways these dummies can be mapped to a value: |
| - a `fir::UndefOp` value: preferable but can't be used for dummies with dynamic |
| bounds or used to define another dummy. |
| - A stack slot. This has intermediate-weight cost but may not be usable for |
| objects with dynamic bounds. |
| - A heap box/descriptor is the heaviest weight option, only used for dynamic |
| objects. |
| |
| These heap allocations will be out of scope for `-fstack-arrays` because the |
| existing implementation already uses stack allocations (or better) where |
| possible, and most such dummy arguments will be removed from generated code. |
| |
| #### `BufferizeHLFIR.cpp` |
| TODO |
| |
| ### Detecting Allocations to Move |
| Allocations which could be moved to the stack will be detected by performing a |
| forward dense data flow analysis using `mlir::dataflow::DenseForwardDataFlowAnalysis`. |
| This analysis will search for SSA values created by a `fir.allocmem` which are |
| always freed using `fir.freemem` within the same function. |
| |
| Tracking allocations by SSA values will miss some cases where address |
| calculations are duplicated in different blocks: resulting in different SSA |
| values as arguments for the allocation and the free. It is believed that simple |
| tracking of SSA values is sufficient for detecting the allocations for array |
| temporaries added by Flang, because these temporaries should be simple arrays: |
| not nested inside of derived types or other arrays. |
| |
| Arrays allocated by an `allocate` statement in source code should not be moved |
| to the stack. These will be identified by adding an attribute to these |
| `fir.allocmem` operations when they are lowered. Intrinsics such as `allocated` |
| and `move_alloc` do not need to be supported because the array temporaries moved |
| to the stack will never be visible in user code. |
| |
| Only allocations for arrays will be considered for moving to the stack. |
| |
| When conducting the dense data flow analysis, the pass will assume that existing |
| allocations are not freed inside called functions. This is true for the existing |
| heap array temporary allocations generated by Flang. If an allocation were to be |
| freed inside of a called function, the allocation would presumably not also be |
| freed later in the caller function (as this would be a double-free). Therefore |
| the stack arrays pass would not find where the allocation was freed and so not |
| transform the allocation. It is necessary to make this assumption so that the |
| stack arrays pass can transform array allocations created for pass-by-value |
| function arguments. |
| |
| ### Moving Allocations |
| The `fir.allocmem` will be replaced by a `fir.alloca` with the same arguments. |
| The `fir.freemem` will be removed entirely. |
| |
| Where possible, the `fir.alloca` should be placed in the function entry block |
| (so we can be sure it only happens once). This may not be possible if the array |
| has non-constant extents (causing the `fir.alloca` to have SSA values as |
| operands). In this case, the `fir.alloca` will be placed immediately after the |
| last operand becomes available. |
| |
| If this location is inside a loop (either an `mlir::LoopLikeOpInterface` or a |
| cyclic CFG), the transformation should attempt to use the `llvm.stacksave`/ |
| `llvm.stackrestore` intrinsics to ensure that the stack does not grow on every |
| loop iteration. Use of these intrinsics is considered valid when the allocation |
| and deallocation occur within the same block and there are no other stack |
| allocations between them. If this is not the case, the existing heap allocation |
| will be preserved. |
| |
| ### FIR attribute |
| A FIR attribute will be added to distinguish `fir.allocmem` arising from |
| `allocate` statements in source from `fir.allocmem` operations added by Flang. |
| The attribute will be called `"fir.must_be_heap"` and will have a boolean value: |
| `true` meaning that the stack arrays pass must not move the allocation, `false` |
| meaning that stack arrays may move the allocation. Not specifying the attribute |
| will be equivalent to setting it to `false`. |
| |
| ## Testing Plan |
| FileCheck tests will be written to check each of the above identified sources of |
| heap allocated array temporaries are detected and converted by the new pass. |
| |
| Another test will check that `allocate` statements in source code will not be |
| moved to the stack. |