flang/docs/fstack-arrays.md - toolchain/llvm-project - Git at Google

 # Stack arrays pass
 ## Problem Description
 In gfortran, `-fstack-arrays` will cause all local arrays, including those of
 unknown size, to be allocated from stack memory. Gfortran enables this flag by
 default at `-Ofast`.

 Flang already allocates all local arrays on the stack (the
 `memory-allocation-opt` pass can move large allocations to the heap, but by
 default it will not). But there are some cases where temporary arrays are
 created on the heap by Flang. Moving these temporary arrays is the goal here.

 ## Proposed Solution
 One approach would be to write a pass which attempts to move heap allocations to
 the stack (like the `memory-allocation-opt` pass in reverse). This approach has
 the advantage of a self-contained implementation and ensuring that newly added
 cases are covered.

 Another approach would be to modify each place in which arrays are allocated on
 the heap to instead generate a stack allocation. The advantage of the second
 approach is that it would be difficult to match up all `fir.allocmem` operations
 with their associated `fir.freemem`. In general, the lifetimes of heap allocated
 memory are not constrained by the current function's stack frame and so cannot
 be always converted to stack allocations. It is much easier to swap heap
 allocations for stack allocations when they are first generated because the
 lifetime information is conveniently available.

 For example, to rewrite the heap allocation in the `array-value-copy` pass with
 a stack allocation using the first approach would require analysis to ensure
 that the heap allocation is always freed before the function returns. This is
 much more complex than never generating a heap allocation (and free) in the
 first place (the second approach).

 The plan is to take the more complex first approach so that newly added changes
 to lowering code do not need to be made to support the stack arrays option. The
 general problem of determining heap allocation lifetimes can be simplified in
 this case because only those allocations which are always freed before exit from
 the same function are possible to be moved to heap allocations in that
 function's stack frame. Furthermore, the aim of this pass would be to only move
 those allocations generated by Flang, and so some of the more difficult cases
 can be avoided. Where the transformation is not possible, the existing heap
 allocation will be kept.

 ## Implementation details overview
 In order to limit the complexity of the proposed pass, it is useful to
 understand the situations in which Flang will generate heap allocations.

 ### Known Heap Array Allocations
 Flang allocates most arrays on the stack by default, but there are a few cases
 where temporary arrays are allocated on the heap:
 - `flang/lib/Optimizer/Transforms/ArrayValueCopy.cpp`
 - `flang/lib/Optimizer/Transforms/MemoryAllocation.cpp`
 - `flang/lib/Lower/ConvertExpr.cpp`
 - `flang/lib/Lower/IntrinsicCall.cpp`
 - `flang/lib/Lower/ConvertVariable.cpp`

 Lowering code is being updated and in the future, temporaries for expressions
 will be created in the HLFIR bufferization pass in
 `flang/lib/Optimizer/HLFIR/Trnasforms/BufferizeHLFIR.cpp`.

 #### `ArrayValueCopy.cpp`
 Memory is allocated for a temporary array in `allocateArrayTemp()`. This
 temporary array is used to ensure that assignments of one array to itself
 produce the required value. E.g.

 ```
 integer, dimension(5), intent(inout) :: x
 x(3,4) = x(1,2)
 ```

 #### `MemoryAllocation.cpp`
 The default options for the Memory Allocation transformation ensure that no
 array allocations, no matter how large, are moved from the stack to the heap.

 #### `ConvertExpr.cpp`
 `ConvertExpr.cpp` allocates many array temporaries on the heap:
   - Passing array arguments by value or when they need re-shaping
   - Lowering elemental array expressions
   - Lowering mask expressions
   - Array constructors

 The last two of these cases are **not** covered by the current stack arrays pass
 design.

 The FIR code generated for mask expressions (the WHERE construct) sets a
 boolean variable to indicate whether a heap allocation was necessary. The
 allocation is only freed if the variable indicates that the allocation was
 performed to begin with. The proposed dataflow analysis is not intelligent
 enough to statically determine that the boolean variable will always be true
 when the allocation is performed. Beyond this, the control flow in the generated
 FIR code passes the allocated memory through `fir.result`, resulting in a
 different SSA value to be allocated and freed, causing the analysis not to
 realise that the allocated memory is freed. The most convenient solution here
 would be to generate less complicated FIR code, as the existing codegen has
 known bugs: https://github.com/llvm/llvm-project/issues/56921,
 https://github.com/llvm/llvm-project/issues/59803.

 Code generated for array constructors uses `realloc()` to grow the allocated
 buffer because the size of the resulting array cannot always be determined
 ahead of running the constructor. This makes this temporary unsuitable
 for allocation on the stack.

 #### `IntrinsicCall.cpp`
 The existing design is for the runtime to do the allocation and the lowering
 code to insert `fir.freemem` to remove the allocation. It is not clear whether
 this can be replaced by adapting lowering code to do stack allocations and to
 pass these to the runtime. This would be a significant change and so is out of
 scope of `-fstack-arrays`.

 #### `ConvertVariable.cpp`
 See `Fortran::lower::MapSymbolAttributes`:

 Dummy arguments that are not declared in the current entry point require a
 skeleton definition. Most such "unused" dummies will not survive into final
 generated code, but some will. It is illegal to reference one at run time if it
 does. There are three ways these dummies can be mapped to a value:
 - a `fir::UndefOp` value: preferable but can't be used for dummies with dynamic
   bounds or used to define another dummy.
 - A stack slot. This has intermediate-weight cost but may not be usable for
   objects with dynamic bounds.
 - A heap box/descriptor is the heaviest weight option, only used for dynamic
   objects.

 These heap allocations will be out of scope for `-fstack-arrays` because the
 existing implementation already uses stack allocations (or better) where
 possible, and most such dummy arguments will be removed from generated code.

 #### `BufferizeHLFIR.cpp`
 TODO

 ### Detecting Allocations to Move
 Allocations which could be moved to the stack will be detected by performing a
 forward dense data flow analysis using `mlir::dataflow::DenseForwardDataFlowAnalysis`.
 This analysis will search for SSA values created by a `fir.allocmem` which are
 always freed using `fir.freemem` within the same function.

 Tracking allocations by SSA values will miss some cases where address
 calculations are duplicated in different blocks: resulting in different SSA
 values as arguments for the allocation and the free. It is believed that simple
 tracking of SSA values is sufficient for detecting the allocations for array
 temporaries added by Flang, because these temporaries should be simple arrays:
 not nested inside of derived types or other arrays.

 Arrays allocated by an `allocate` statement in source code should not be moved
 to the stack. These will be identified by adding an attribute to these
 `fir.allocmem` operations when they are lowered. Intrinsics such as `allocated`
 and `move_alloc` do not need to be supported because the array temporaries moved
 to the stack will never be visible in user code.

 Only allocations for arrays will be considered for moving to the stack.

 When conducting the dense data flow analysis, the pass will assume that existing
 allocations are not freed inside called functions. This is true for the existing
 heap array temporary allocations generated by Flang. If an allocation were to be
 freed inside of a called function, the allocation would presumably not also be
 freed later in the caller function (as this would be a double-free). Therefore
 the stack arrays pass would not find where the allocation was freed and so not
 transform the allocation. It is necessary to make this assumption so that the
 stack arrays pass can transform array allocations created for pass-by-value
 function arguments.

 ### Moving Allocations
 The `fir.allocmem` will be replaced by a `fir.alloca` with the same arguments.
 The `fir.freemem` will be removed entirely.

 Where possible, the `fir.alloca` should be placed in the function entry block
 (so we can be sure it only happens once). This may not be possible if the array
 has non-constant extents (causing the `fir.alloca` to have SSA values as
 operands). In this case, the `fir.alloca` will be placed immediately after the
 last operand becomes available.

 If this location is inside a loop (either an `mlir::LoopLikeOpInterface` or a
 cyclic CFG), the transformation should attempt to use the `llvm.stacksave`/
 `llvm.stackrestore` intrinsics to ensure that the stack does not grow on every
 loop iteration. Use of these intrinsics is considered valid when the allocation
 and deallocation occur within the same block and there are no other stack
 allocations between them. If this is not the case, the existing heap allocation
 will be preserved.

 ### FIR attribute
 A FIR attribute will be added to distinguish `fir.allocmem` arising from
 `allocate` statements in source from `fir.allocmem` operations  added by Flang.
 The attribute will be called `"fir.must_be_heap"` and will have a boolean value:
 `true` meaning that the stack arrays pass must not move the allocation, `false`
 meaning that stack arrays may move the allocation. Not specifying the attribute
 will be equivalent to setting it to `false`.

 ## Testing Plan
 FileCheck tests will be written to check each of the above identified sources of
 heap allocated array temporaries are detected and converted by the new pass.

 Another test will check that `allocate` statements in source code will not be
 moved to the stack.
	# Stack arrays pass
	## Problem Description
	In gfortran, `-fstack-arrays` will cause all local arrays, including those of
	unknown size, to be allocated from stack memory. Gfortran enables this flag by
	default at `-Ofast`.

	Flang already allocates all local arrays on the stack (the
	`memory-allocation-opt` pass can move large allocations to the heap, but by
	default it will not). But there are some cases where temporary arrays are
	created on the heap by Flang. Moving these temporary arrays is the goal here.

	## Proposed Solution
	One approach would be to write a pass which attempts to move heap allocations to
	the stack (like the `memory-allocation-opt` pass in reverse). This approach has
	the advantage of a self-contained implementation and ensuring that newly added
	cases are covered.

	Another approach would be to modify each place in which arrays are allocated on
	the heap to instead generate a stack allocation. The advantage of the second
	approach is that it would be difficult to match up all `fir.allocmem` operations
	with their associated `fir.freemem`. In general, the lifetimes of heap allocated
	memory are not constrained by the current function's stack frame and so cannot
	be always converted to stack allocations. It is much easier to swap heap
	allocations for stack allocations when they are first generated because the
	lifetime information is conveniently available.

	For example, to rewrite the heap allocation in the `array-value-copy` pass with
	a stack allocation using the first approach would require analysis to ensure
	that the heap allocation is always freed before the function returns. This is
	much more complex than never generating a heap allocation (and free) in the
	first place (the second approach).

	The plan is to take the more complex first approach so that newly added changes
	to lowering code do not need to be made to support the stack arrays option. The
	general problem of determining heap allocation lifetimes can be simplified in
	this case because only those allocations which are always freed before exit from
	the same function are possible to be moved to heap allocations in that
	function's stack frame. Furthermore, the aim of this pass would be to only move
	those allocations generated by Flang, and so some of the more difficult cases
	can be avoided. Where the transformation is not possible, the existing heap
	allocation will be kept.

	## Implementation details overview
	In order to limit the complexity of the proposed pass, it is useful to
	understand the situations in which Flang will generate heap allocations.

	### Known Heap Array Allocations
	Flang allocates most arrays on the stack by default, but there are a few cases
	where temporary arrays are allocated on the heap:
	- `flang/lib/Optimizer/Transforms/ArrayValueCopy.cpp`
	- `flang/lib/Optimizer/Transforms/MemoryAllocation.cpp`
	- `flang/lib/Lower/ConvertExpr.cpp`
	- `flang/lib/Lower/IntrinsicCall.cpp`
	- `flang/lib/Lower/ConvertVariable.cpp`

	Lowering code is being updated and in the future, temporaries for expressions
	will be created in the HLFIR bufferization pass in
	`flang/lib/Optimizer/HLFIR/Trnasforms/BufferizeHLFIR.cpp`.

	#### `ArrayValueCopy.cpp`
	Memory is allocated for a temporary array in `allocateArrayTemp()`. This
	temporary array is used to ensure that assignments of one array to itself
	produce the required value. E.g.

	```
	integer, dimension(5), intent(inout) :: x
	x(3,4) = x(1,2)
	```

	#### `MemoryAllocation.cpp`
	The default options for the Memory Allocation transformation ensure that no
	array allocations, no matter how large, are moved from the stack to the heap.

	#### `ConvertExpr.cpp`
	`ConvertExpr.cpp` allocates many array temporaries on the heap:
	- Passing array arguments by value or when they need re-shaping
	- Lowering elemental array expressions
	- Lowering mask expressions
	- Array constructors

	The last two of these cases are not covered by the current stack arrays pass
	design.

	The FIR code generated for mask expressions (the WHERE construct) sets a
	boolean variable to indicate whether a heap allocation was necessary. The
	allocation is only freed if the variable indicates that the allocation was
	performed to begin with. The proposed dataflow analysis is not intelligent
	enough to statically determine that the boolean variable will always be true
	when the allocation is performed. Beyond this, the control flow in the generated
	FIR code passes the allocated memory through `fir.result`, resulting in a
	different SSA value to be allocated and freed, causing the analysis not to
	realise that the allocated memory is freed. The most convenient solution here
	would be to generate less complicated FIR code, as the existing codegen has
	known bugs: https://github.com/llvm/llvm-project/issues/56921,
	https://github.com/llvm/llvm-project/issues/59803.

	Code generated for array constructors uses `realloc()` to grow the allocated
	buffer because the size of the resulting array cannot always be determined
	ahead of running the constructor. This makes this temporary unsuitable
	for allocation on the stack.

	#### `IntrinsicCall.cpp`
	The existing design is for the runtime to do the allocation and the lowering
	code to insert `fir.freemem` to remove the allocation. It is not clear whether
	this can be replaced by adapting lowering code to do stack allocations and to
	pass these to the runtime. This would be a significant change and so is out of
	scope of `-fstack-arrays`.

	#### `ConvertVariable.cpp`
	See `Fortran::lower::MapSymbolAttributes`:

	Dummy arguments that are not declared in the current entry point require a
	skeleton definition. Most such "unused" dummies will not survive into final
	generated code, but some will. It is illegal to reference one at run time if it
	does. There are three ways these dummies can be mapped to a value:
	- a `fir::UndefOp` value: preferable but can't be used for dummies with dynamic
	bounds or used to define another dummy.
	- A stack slot. This has intermediate-weight cost but may not be usable for
	objects with dynamic bounds.
	- A heap box/descriptor is the heaviest weight option, only used for dynamic
	objects.

	These heap allocations will be out of scope for `-fstack-arrays` because the
	existing implementation already uses stack allocations (or better) where
	possible, and most such dummy arguments will be removed from generated code.

	#### `BufferizeHLFIR.cpp`
	TODO

	### Detecting Allocations to Move
	Allocations which could be moved to the stack will be detected by performing a
	forward dense data flow analysis using `mlir::dataflow::DenseForwardDataFlowAnalysis`.
	This analysis will search for SSA values created by a `fir.allocmem` which are
	always freed using `fir.freemem` within the same function.

	Tracking allocations by SSA values will miss some cases where address
	calculations are duplicated in different blocks: resulting in different SSA
	values as arguments for the allocation and the free. It is believed that simple
	tracking of SSA values is sufficient for detecting the allocations for array
	temporaries added by Flang, because these temporaries should be simple arrays:
	not nested inside of derived types or other arrays.

	Arrays allocated by an `allocate` statement in source code should not be moved
	to the stack. These will be identified by adding an attribute to these
	`fir.allocmem` operations when they are lowered. Intrinsics such as `allocated`
	and `move_alloc` do not need to be supported because the array temporaries moved
	to the stack will never be visible in user code.

	Only allocations for arrays will be considered for moving to the stack.

	When conducting the dense data flow analysis, the pass will assume that existing
	allocations are not freed inside called functions. This is true for the existing
	heap array temporary allocations generated by Flang. If an allocation were to be
	freed inside of a called function, the allocation would presumably not also be
	freed later in the caller function (as this would be a double-free). Therefore
	the stack arrays pass would not find where the allocation was freed and so not
	transform the allocation. It is necessary to make this assumption so that the
	stack arrays pass can transform array allocations created for pass-by-value
	function arguments.

	### Moving Allocations
	The `fir.allocmem` will be replaced by a `fir.alloca` with the same arguments.
	The `fir.freemem` will be removed entirely.

	Where possible, the `fir.alloca` should be placed in the function entry block
	(so we can be sure it only happens once). This may not be possible if the array
	has non-constant extents (causing the `fir.alloca` to have SSA values as
	operands). In this case, the `fir.alloca` will be placed immediately after the
	last operand becomes available.

	If this location is inside a loop (either an `mlir::LoopLikeOpInterface` or a
	cyclic CFG), the transformation should attempt to use the `llvm.stacksave`/
	`llvm.stackrestore` intrinsics to ensure that the stack does not grow on every
	loop iteration. Use of these intrinsics is considered valid when the allocation
	and deallocation occur within the same block and there are no other stack
	allocations between them. If this is not the case, the existing heap allocation
	will be preserved.

	### FIR attribute
	A FIR attribute will be added to distinguish `fir.allocmem` arising from
	`allocate` statements in source from `fir.allocmem` operations added by Flang.
	The attribute will be called `"fir.must_be_heap"` and will have a boolean value:
	`true` meaning that the stack arrays pass must not move the allocation, `false`
	meaning that stack arrays may move the allocation. Not specifying the attribute
	will be equivalent to setting it to `false`.

	## Testing Plan
	FileCheck tests will be written to check each of the above identified sources of
	heap allocated array temporaries are detected and converted by the new pass.

	Another test will check that `allocate` statements in source code will not be
	moved to the stack.