| <!--===- docs/OpenMP-declare-target.md |
| |
| Part of the LLVM Project, under the Apache License v2.0 with LLVM |
| Exceptions. |
| See https://llvm.org/LICENSE.txt for license information. |
| SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception |
| |
| --> |
| |
| # Introduction to Declare Target |
| |
| In OpenMP `declare target` is a directive that can be applied to a function or |
| variable (primarily global) to notate to the compiler that it should be |
| generated in a particular device's environment. In essence whether something |
| should be emitted for host or device, or both. An example of its usage for |
| both data and functions can be seen below. |
| |
| ```Fortran |
| module test_0 |
| integer :: sp = 0 |
| !$omp declare target link(sp) |
| end module test_0 |
| |
| program main |
| use test_0 |
| !$omp target map(tofrom:sp) |
| sp = 1 |
| !$omp end target |
| end program |
| ``` |
| |
| In the above example, we create a variable in a separate module, mark it |
| as `declare target` and then map it, embedding it into the device IR and |
| assigning to it. |
| |
| |
| ```Fortran |
| function func_t_device() result(i) |
| !$omp declare target to(func_t_device) device_type(nohost) |
| INTEGER :: I |
| I = 1 |
| end function func_t_device |
| |
| program main |
| !$omp target |
| call func_t_device() |
| !$omp end target |
| end program |
| ``` |
| |
| In the above example, we are stating that a function is required on device |
| utilising `declare target`, and that we will not be utilising it on host, |
| so we are in theory free to remove or ignore it there. A user could also |
| in this case, leave off the `declare target` from the function and it |
| would be implicitly marked `declare target any` (for both host and device), |
| as it's been utilised within a target region. |
| |
| # Declare Target as represented in the OpenMP Dialect |
| |
| In the OpenMP Dialect `declare target` is not represented by a specific |
| `operation`. Instead, it's an OpenMP dialect specific `attribute` that can be |
| applied to any operation in any dialect, which helps to simplify the |
| utilisation of it. Rather than replacing or modifying existing global or |
| function `operations` in a dialect, it applies to it as extra metadata that |
| the lowering can use in different ways as is necessary. |
| |
| The `attribute` is composed of multiple fields representing the clauses you |
| would find on the `declare target` directive i.e. device type (`nohost`, |
| `any`, `host`) or the capture clause (`link` or `to`). A small example of |
| `declare target` applied to a Fortran `real` can be found below: |
| |
| ``` |
| fir.global internal @_QFEi {omp.declare_target = |
| #omp.declaretarget<device_type = (any), capture_clause = (to)>} : f32 { |
| %0 = fir.undefined f32 |
| fir.has_value %0 : f32 |
| } |
| ``` |
| |
| This would look similar for function style `operations`. |
| |
| The application and access of this attribute is aided by an OpenMP Dialect |
| MLIR Interface named `DeclareTargetInterface`, which can be utilised on |
| operations to access the appropriate interface functions, e.g.: |
| |
| ```C++ |
| auto declareTargetGlobal = |
| llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(Op.getOperation()); |
| declareTargetGlobal.isDeclareTarget(); |
| ``` |
| |
| # Declare Target Fortran OpenMP Lowering |
| |
| The initial lowering of `declare target` to MLIR for both use-cases is done |
| inside of the usual OpenMP lowering in flang/lib/Lower/OpenMP.cpp. However, |
| some direct calls to `declare target` related functions from Flang's |
| lowering bridge in flang/lib/Lower/Bridge.cpp are made. |
| |
| The marking of operations with the declare target attribute happens in two |
| phases, the second one optional and contingent on the first failing. The |
| initial phase happens when the declare target directive and its clauses |
| are initially processed, with the primary data gathering for the directive and |
| clause happening in a function called `getDeclareTargetInfo`. This is then used |
| to feed the `markDeclareTarget` function, which does the actual marking |
| utilising the `DeclareTargetInterface`. If it encounters a variable or function |
| that has been marked twice over multiple directives with two differing device |
| types (e.g. `host`, `nohost`), then it will swap the device type to `any`. |
| |
| Whenever we invoke `genFIR` on an `OpenMPDeclarativeConstruct` from the |
| lowering bridge, we are also invoking another function called |
| `gatherOpenMPDeferredDeclareTargets`, which gathers information relevant to the |
| application of the `declare target` attribute. This information |
| includes the symbol that it should be applied to, device type clause, |
| and capture clause, and it is stored in a vector that is part of the lowering |
| bridge's instantiation of the `AbstractConverter`. It is only stored if we |
| encounter a function or variable symbol that does not have an operation |
| instantiated for it yet. This cannot happen as part of the |
| initial marking as we must store this data in the lowering bridge and we |
| only have access to the abstract version of the converter via the OpenMP |
| lowering. |
| |
| The information produced by the first phase is used in the second phase, |
| which is a form of deferred processing of the `declare target` marked |
| operations that have delayed generation and cannot be proccessed in the |
| first phase. The main notable case this occurs currently is when a |
| Fortran function interface has been marked. This is |
| done via the function |
| `markOpenMPDeferredDeclareTargetFunctions`, which is called from the lowering |
| bridge at the end of the lowering process allowing us to mark those where |
| possible. It iterates over the data previously gathered by |
| `gatherOpenMPDeferredDeclareTargets` |
| checking if any of the recorded symbols have now had their corresponding |
| operations instantiated and applying the declare target attribute where |
| possible utilising `markDeclareTarget`. However, it must be noted that it |
| is still possible for operations not to be generated for certain symbols, |
| in particular the case of function interfaces that are not directly used |
| or defined within the current module. This means we cannot emit errors in |
| the case of left-over unmarked symbols. These must (and should) be caught |
| by the initial semantic analysis. |
| |
| NOTE: `declare target` can be applied to implicit `SAVE` attributed variables. |
| However, by default Flang does not represent these as `GlobalOp`'s, which means |
| we cannot tag and lower them as `declare target` normally. Instead, similarly |
| to the way `threadprivate` handles these cases, we raise and initialize the |
| variable as an internal `GlobalOp` and apply the attribute. This occurs in the |
| flang/lib/Lower/OpenMP.cpp function `genDeclareTargetIntGlobal`. |
| |
| # Declare Target Transformation Passes for Flang |
| |
| There are currently two passes within Flang that are related to the processing |
| of `declare target`: |
| * `MarkDeclareTarget` - This pass is in charge of marking functions captured |
| (called from) in `target` regions or other `declare target` marked functions as |
| `declare target`. It does so recursively, i.e. nested calls will also be |
| implicitly marked. It currently will try to mark things as conservatively as |
| possible, e.g. if captured in a `target` region it will apply `nohost`, unless |
| it encounters a `host` `declare target` in which case it will apply the `any` |
| device type. Functions are handled similarly, except we utilise the parent's |
| device type where possible. |
| * `FunctionFiltering` - This is executed after the `MarkDeclareTarget` |
| pass, and its job is to conservatively remove host functions from |
| the module where possible when compiling for the device. This helps make |
| sure that most incompatible code for the host is not lowered for the |
| device. Host functions with `target` regions in them need to be preserved |
| (e.g. for lowering the `target region`(s) inside). Otherwise, it removes |
| any function marked as a `declare target host` function and any uses will be |
| replaced with `undef`'s so that the remaining host code doesn't become broken. |
| Host functions with `target` regions are marked with a `declare target host` |
| attribute so they will be removed after outlining the target regions contained |
| inside. |
| |
| While this infrastructure could be generally applicable to more than just Flang, |
| it is only utilised in the Flang frontend, so it resides there rather than in |
| the OpenMP dialect codebase. |
| |
| # Declare Target OpenMP Dialect To LLVM-IR Lowering |
| |
| The OpenMP dialect lowering of `declare target` is done through the |
| `amendOperation` flow, as it's not an `operation` but rather an |
| `attribute`. This is triggered immediately after the corresponding |
| operation has been lowered to LLVM-IR. As it is applicable to |
| different types of operations, we must specialise this function for |
| each operation type that we may encounter. Currently, this is |
| `GlobalOp`'s and `FuncOp`'s. |
| |
| `FuncOp` processing is fairly simple. When compiling for the device, |
| `host` marked functions are removed, including those that could not |
| be removed earlier due to having `target` directives within. This |
| leaves `any`, `device` or indeterminable functions left in the |
| module to lower further. When compiling for the host, no filtering is |
| done because `nohost` functions must be available as a fallback |
| implementation. |
| |
| For `GlobalOp`'s, the processing is a little more complex. We |
| currently leverage the `registerTargetGlobalVariable` and |
| `getAddrOfDeclareTargetVar` `OMPIRBuilder` functions shared with Clang. |
| These two functions invoke each other depending on the clauses and options |
| provided to the `OMPIRBuilder` (in particular, unified shared memory). Their |
| main purposes are the generation of a new global device pointer with a |
| "ref_" prefix on the device and enqueuing metadata generation by the |
| `OMPIRBuilder` to be produced at module finalization time. This is done |
| for both host and device and it links the newly generated device global |
| pointer and the host pointer together across the two modules. |
| |
| Similarly to other metadata (e.g. for `TargetOp`) that is shared across |
| both host and device modules, processing of `GlobalOp`'s in the device |
| needs access to the previously generated host IR file, which is done |
| through another `attribute` applied to the `ModuleOp` by the compiler |
| frontend. The file is loaded in and consumed by the `OMPIRBuilder` to |
| populate it's `OffloadInfoManager` data structures, keeping host and |
| device appropriately synchronised. |
| |
| The second (and more important to remember) is that as we effectively replace |
| the original LLVM-IR generated for the `declare target` marked `GlobalOp` we |
| have some corrections we need to do for `TargetOp`'s (or other region |
| operations that use them directly) which still refer to the original lowered |
| global operation. This is done via `handleDeclareTargetMapVar` which is invoked |
| as the final function and alteration to the lowered `target` region, it's only |
| invoked for device as it's only required in the case where we have emitted the |
| "ref" pointer , and it effectively replaces all uses of the originally lowered |
| global symbol, with our new global ref pointer's symbol. Currently we do not |
| remove or delete the old symbol, this is due to the fact that the same symbol |
| can be utilised across multiple target regions, if we remove it, we risk |
| breaking lowerings of target regions that will be processed at a later time. |
| To appropriately delete these no longer necessary symbols we would need a |
| deferred removal process at the end of the module, which is currently not in |
| place. It may be possible to store this information in the OMPIRBuilder and |
| then perform this cleanup process on finalization, but this is open for |
| discussion and implementation still. |
| |
| # Current Support |
| |
| For the moment, `declare target` should work for: |
| * Marking functions/subroutines and function/subroutine interfaces for |
| generation on host, device or both. |
| * Implicit function/subroutine capture for calls emitted in a `target` region |
| or explicitly marked `declare target` function/subroutine. Note: Calls made |
| via arguments passed to other functions must still be themselves marked |
| `declare target`, e.g. passing a `C` function pointer and invoking it, then |
| the interface and the `C` function in the other module must be marked |
| `declare target`, with the same type of marking as indicated by the |
| specification. |
| * Marking global variables with `declare target`'s `link` clause and mapping |
| the data to the device data environment utilising `declare target`. This may |
| not work for all types yet, but for scalars and arrays of scalars, it |
| should. |
| |
| Doesn't work for, or needs further testing for: |
| * Marking the following types with `declare target link` (needs further |
| testing): |
| * Descriptor based types, e.g. pointers/allocatables. |
| * Derived types. |
| * Members of derived types (use-case needs legality checking with OpenMP |
| specification). |
| * Marking global variables with `declare target`'s `to` clause. A lot of the |
| lowering should exist, but it needs further testing and likely some further |
| changes to fully function. |