| // Copyright 2017-2023 The Khronos Group Inc. |
| // |
| // SPDX-License-Identifier: CC-BY-4.0 |
| |
| [appendix] |
| [[memory-model]] |
| = Memory Model |
| |
| [NOTE] |
| .Note |
| ==== |
| This memory model describes synchronizations provided by all |
| implementations; however, some of the synchronizations defined require extra |
| features to be supported by the implementation. |
| ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] |
| See slink:VkPhysicalDeviceVulkanMemoryModelFeatures. |
| endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] |
| ==== |
| |
| [[memory-model-agent]] |
| == Agent |
| |
| _Operation_ is a general term for any task that is executed on the system. |
| |
| [NOTE] |
| .Note |
| ==== |
| An operation is by definition something that is executed. |
| Thus if an instruction is skipped due to control flow, it does not |
| constitute an operation. |
| ==== |
| |
| Each operation is executed by a particular _agent_. |
| Possible agents include each shader invocation, each host thread, and each |
| fixed-function stage of the pipeline. |
| |
| |
| [[memory-model-memory-location]] |
| == Memory Location |
| |
| A _memory location_ identifies unique storage for 8 bits of data. |
| Memory operations access a _set of memory locations_ consisting of one or |
| more memory locations at a time, e.g. an operation accessing a 32-bit |
| integer in memory would read/write a set of four memory locations. |
| Memory operations that access whole aggregates may: access any padding bytes |
| between elements or members, but no padding bytes at the end of the |
| aggregate. |
| Two sets of memory locations _overlap_ if the intersection of their sets of |
| memory locations is non-empty. |
| A memory operation must: not affect memory at a memory location not within |
| its set of memory locations. |
| |
| Memory locations for buffers and images are explicitly allocated in |
| slink:VkDeviceMemory objects, and are implicitly allocated for SPIR-V |
| variables in each shader invocation. |
| |
| ifdef::VK_KHR_workgroup_memory_explicit_layout[] |
| Variables with code:Workgroup storage class that point to a block-decorated |
| type share a set of memory locations. |
| endif::VK_KHR_workgroup_memory_explicit_layout[] |
| |
| |
| [[memory-model-allocation]] |
| == Allocation |
| |
| The values stored in newly allocated memory locations are determined by a |
| SPIR-V variable's initializer, if present, or else are undefined:. |
| At the time an allocation is created there have been no |
| <<memory-model-memory-operation,memory operations>> to any of its memory |
| locations. |
| The initialization is not considered to be a memory operation. |
| |
| [NOTE] |
| .Note |
| ==== |
| For tessellation control shader output variables, a consequence of |
| initialization not being considered a memory operation is that some |
| implementations may need to insert a barrier between the initialization of |
| the output variables and any reads of those variables. |
| ==== |
| |
| |
| [[memory-model-memory-operation]] |
| == Memory Operation |
| |
| For an operation A and memory location M: |
| |
| * [[memory-model-access-read]] A _reads_ M if and only if the data stored |
| in M is an input to A. |
| * [[memory-model-access-write]] A _writes_ M if and only if the data |
| output from A is stored to M. |
| * [[memory-model-access-access]] A _accesses_ M if and only if it either |
| reads or writes (or both) M. |
| |
| [NOTE] |
| .Note |
| ==== |
| A write whose value is the same as what was already in those memory |
| locations is still considered to be a write and has all the same effects. |
| ==== |
| |
| |
| [[memory-model-references]] |
| == Reference |
| |
| A _reference_ is an object that a particular agent can: use to access a set |
| of memory locations. |
| On the host, a reference is a host virtual address. |
| On the device, a reference is: |
| |
| * The descriptor that a variable is bound to, for variables in Image, |
| Uniform, or StorageBuffer storage classes. |
| If the variable is an array (or array of arrays, etc.) then each element |
| of the array may: be a unique reference. |
| ifdef::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[] |
| * The address range for a buffer in code:PhysicalStorageBuffer storage |
| class, where the base of the address range is queried with |
| ifndef::VK_VERSION_1_2,VK_KHR_buffer_device_address[] |
| flink:vkGetBufferDeviceAddressEXT |
| endif::VK_VERSION_1_2,VK_KHR_buffer_device_address[] |
| ifdef::VK_VERSION_1_2,VK_KHR_buffer_device_address[] |
| flink:vkGetBufferDeviceAddress |
| endif::VK_VERSION_1_2,VK_KHR_buffer_device_address[] |
| and the length of the range is the size of the buffer. |
| endif::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[] |
| ifdef::VK_KHR_workgroup_memory_explicit_layout[] |
| * A single common reference for all variables with code:Workgroup storage |
| class that point to a block-decorated type. |
| * The variable itself for non-block-decorated type variables in |
| code:Workgroup storage class. |
| endif::VK_KHR_workgroup_memory_explicit_layout[] |
| * The variable itself for variables in other storage classes. |
| |
| Two memory accesses through distinct references may: require availability |
| and visibility operations as defined |
| <<memory-model-location-ordered,below>>. |
| |
| |
| [[memory-model-program-order]] |
| == Program-Order |
| |
| A _dynamic instance_ of an instruction is defined in SPIR-V |
| (https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#DynamicInstance) |
| as a way of referring to a particular execution of a static instruction. |
| Program-order is an ordering on dynamic instances of instructions executed |
| by a single shader invocation: |
| |
| * (Basic block): If instructions A and B are in the same basic block, and |
| A is listed in the module before B, then the n'th dynamic instance of A |
| is program-ordered before the n'th dynamic instance of B. |
| * (Branch): The dynamic instance of a branch or switch instruction is |
| program-ordered before the dynamic instance of the OpLabel instruction |
| to which it transfers control. |
| * (Call entry): The dynamic instance of an code:OpFunctionCall instruction |
| is program-ordered before the dynamic instances of the |
| code:OpFunctionParameter instructions and the body of the called |
| function. |
| * (Call exit): The dynamic instance of the instruction following an |
| code:OpFunctionCall instruction is program-ordered after the dynamic |
| instance of the return instruction executed by the called function. |
| * (Transitive Closure): If dynamic instance A of any instruction is |
| program-ordered before dynamic instance B of any instruction and B is |
| program-ordered before dynamic instance C of any instruction then A is |
| program-ordered before C. |
| * (Complete definition): No other dynamic instances are program-ordered. |
| |
| For instructions executed on the host, the source language defines the |
| program-order relation (e.g. as "`sequenced-before`"). |
| |
| |
| ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] |
| [[shader-call-related]] |
| == Shader Call Related |
| |
| Shader-call-related is an equivalence relation on invocations defined as the |
| symmetric and transitive closure of: |
| |
| * A is shader-call-related to B if A is created by an |
| <<ray-tracing-repack,invocation repack>> instruction executed by B. |
| |
| |
| [[shader-call-order]] |
| == Shader Call Order |
| |
| Shader-call-order is a partial order on dynamic instances of instructions |
| executed by invocations that are shader-call-related: |
| |
| * (Program order): If dynamic instance A is program-ordered before B, then |
| A is shader-call-ordered before B. |
| * (Shader call entry): If A is a dynamic instance of an |
| <<ray-tracing-repack,invocation repack>> instruction and B is a dynamic |
| instance executed by an invocation that is created by A, then A is |
| shader-call-ordered before B. |
| * (Shader call exit): If A is a dynamic instance of an |
| <<ray-tracing-repack,invocation repack>> instruction, B is the next |
| dynamic instance executed by the same invocation, and C is a dynamic |
| instance executed by an invocation that is created by A, then C is |
| shader-call-ordered before B. |
| * (Transitive closure): If A is shader-call-ordered-before B and B is |
| shader-call-ordered-before C, then A is shader-call-ordered-before C. |
| * (Complete definition): No other dynamic instances are |
| shader-call-ordered. |
| endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] |
| |
| |
| [[memory-model-scope]] |
| == Scope |
| |
| Atomic and barrier instructions include scopes which identify sets of shader |
| invocations that must: obey the requested ordering and atomicity rules of |
| the operation, as defined below. |
| |
| The various scopes are described in detail in <<shaders-scope, the Shaders |
| chapter>>. |
| |
| |
| [[memory-model-atomic-operation]] |
| == Atomic Operation |
| |
| An _atomic operation_ on the device is any SPIR-V operation whose name |
| begins with code:OpAtomic. |
| An atomic operation on the host is any operation performed with an |
| std::atomic typed object. |
| |
| Each atomic operation has a memory <<memory-model-scope,scope>> and a |
| <<memory-model-memory-semantics,semantics>>. |
| Informally, the scope determines which other agents it is atomic with |
| respect to, and the <<memory-model-memory-semantics,semantics>> constrains |
| its ordering against other memory accesses. |
| Device atomic operations have explicit scopes and semantics. |
| Each host atomic operation implicitly uses the code:CrossDevice scope, and |
| uses a memory semantics equivalent to a C++ std::memory_order value of |
| relaxed, acquire, release, acq_rel, or seq_cst. |
| |
| Two atomic operations A and B are _potentially-mutually-ordered_ if and only |
| if all of the following are true: |
| |
| * They access the same set of memory locations. |
| * They use the same reference. |
| * A is in the instance of B's memory scope. |
| * B is in the instance of A's memory scope. |
| * A and B are not the same operation (irreflexive). |
| |
| Two atomic operations A and B are _mutually-ordered_ if and only if they are |
| potentially-mutually-ordered and any of the following are true: |
| |
| * A and B are both device operations. |
| * A and B are both host operations. |
| * A is a device operation, B is a host operation, and the implementation |
| supports concurrent host- and device-atomics. |
| |
| [NOTE] |
| .Note |
| ==== |
| If two atomic operations are not mutually-ordered, and if their sets of |
| memory locations overlap, then each must: be synchronized against the other |
| as if they were non-atomic operations. |
| ==== |
| |
| |
| [[memory-model-scoped-modification-order]] |
| == Scoped Modification Order |
| |
| For a given atomic write A, all atomic writes that are mutually-ordered with |
| A occur in an order known as A's _scoped modification order_. |
| A's scoped modification order relates no other operations. |
| |
| [NOTE] |
| .Note |
| ==== |
| Invocations outside the instance of A's memory scope may: observe the values |
| at A's set of memory locations becoming visible to it in an order that |
| disagrees with the scoped modification order. |
| ==== |
| |
| [NOTE] |
| .Note |
| ==== |
| It is valid to have non-atomic operations or atomics in a different scope |
| instance to the same set of memory locations, as long as they are |
| synchronized against each other as if they were non-atomic (if they are not, |
| it is treated as a <<memory-model-access-data-race,data race>>). |
| That means this definition of A's scoped modification order could include |
| atomic operations that occur much later, after intervening non-atomics. |
| That is a bit non-intuitive, but it helps to keep this definition simple and |
| non-circular. |
| ==== |
| |
| |
| [[memory-model-memory-semantics]] |
| == Memory Semantics |
| |
| Non-atomic memory operations, by default, may: be observed by one agent in a |
| different order than they were written by another agent. |
| |
| Atomics and some synchronization operations include _memory semantics_, |
| which are flags that constrain the order in which other memory accesses |
| (including non-atomic memory accesses and |
| <<memory-model-availability-visibility,availability and visibility |
| operations>>) performed by the same agent can: be observed by other agents, |
| or can: observe accesses by other agents. |
| |
| Device instructions that include semantics are code:OpAtomic*, |
| code:OpControlBarrier, code:OpMemoryBarrier, and code:OpMemoryNamedBarrier. |
| Host instructions that include semantics are some std::atomic methods and |
| memory fences. |
| |
| SPIR-V supports the following memory semantics: |
| |
| * Relaxed: No constraints on order of other memory accesses. |
| * Acquire: A memory read with this semantic performs an _acquire |
| operation_. |
| A memory barrier with this semantic is an _acquire barrier_. |
| * Release: A memory write with this semantic performs a _release |
| operation_. |
| A memory barrier with this semantic is a _release barrier_. |
| * AcquireRelease: A memory read-modify-write operation with this semantic |
| performs both an acquire operation and a release operation, and inherits |
| the limitations on ordering from both of those operations. |
| A memory barrier with this semantic is both a release and acquire |
| barrier. |
| |
| [NOTE] |
| .Note |
| ==== |
| SPIR-V does not support "`consume`" semantics on the device. |
| ==== |
| |
| The memory semantics operand also includes _storage class semantics_ which |
| indicate which storage classes are constrained by the synchronization. |
| SPIR-V storage class semantics include: |
| |
| * UniformMemory |
| * WorkgroupMemory |
| * ImageMemory |
| * OutputMemory |
| |
| Each SPIR-V memory operation accesses a single storage class. |
| Semantics in synchronization operations can include a combination of storage |
| classes. |
| |
| The UniformMemory storage class semantic applies to accesses to memory in |
| the |
| ifdef::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[] |
| PhysicalStorageBuffer, |
| endif::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[] |
| ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] |
| code:ShaderRecordBufferKHR, |
| endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] |
| Uniform and StorageBuffer storage classes. |
| The WorkgroupMemory storage class semantic applies to accesses to memory in |
| the Workgroup storage class. |
| The ImageMemory storage class semantic applies to accesses to memory in the |
| Image storage class. |
| The OutputMemory storage class semantic applies to accesses to memory in the |
| Output storage class. |
| |
| [NOTE] |
| .Note |
| ==== |
| Informally, these constraints limit how memory operations can be reordered, |
| and these limits apply not only to the order of accesses as performed in the |
| agent that executes the instruction, but also to the order the effects of |
| writes become visible to all other agents within the same instance of the |
| instruction's memory scope. |
| ==== |
| |
| [NOTE] |
| .Note |
| ==== |
| Release and acquire operations in different threads can: act as |
| synchronization operations, to guarantee that writes that happened before |
| the release are visible after the acquire. |
| (This is not a formal definition, just an Informative forward reference.) |
| ==== |
| |
| [NOTE] |
| .Note |
| ==== |
| The OutputMemory storage class semantic is only useful in tessellation |
| control shaders, which is the only execution model where output variables |
| are shared between invocations. |
| ==== |
| |
| The memory semantics operand can: also include availability and visibility |
| flags, which apply availability and visibility operations as described in |
| <<memory-model-availability-visibility,availability and visibility>>. |
| The availability/visibility flags are: |
| |
| * MakeAvailable: Semantics must: be Release or AcquireRelease. |
| Performs an availability operation before the release operation or |
| barrier. |
| * MakeVisible: Semantics must: be Acquire or AcquireRelease. |
| Performs a visibility operation after the acquire operation or barrier. |
| |
| The specifics of these operations are defined in |
| <<memory-model-availability-visibility-semantics,Availability and Visibility |
| Semantics>>. |
| |
| Host atomic operations may: support a different list of memory semantics and |
| synchronization operations, depending on the host architecture and source |
| language. |
| |
| |
| [[memory-model-release-sequence]] |
| == Release Sequence |
| |
| After an atomic operation A performs a release operation on a set of memory |
| locations M, the _release sequence headed by A_ is the longest continuous |
| subsequence of A's scoped modification order that consists of: |
| |
| * the atomic operation A as its first element |
| * atomic read-modify-write operations on M by any agent |
| |
| [NOTE] |
| .Note |
| ==== |
| The atomics in the last bullet must: be mutually-ordered with A by virtue of |
| being in A's scoped modification order. |
| ==== |
| |
| [NOTE] |
| .Note |
| ==== |
| This intentionally omits "`atomic writes to M performed by the same agent |
| that performed A`", which is present in the corresponding C++ definition. |
| ==== |
| |
| |
| [[memory-model-synchronizes-with]] |
| == Synchronizes-With |
| |
| _Synchronizes-with_ is a relation between operations, where each operation |
| is either an atomic operation or a memory barrier (aka fence on the host). |
| |
| If A and B are atomic operations, then A synchronizes-with B if and only if |
| all of the following are true: |
| |
| * A performs a release operation |
| * B performs an acquire operation |
| * A and B are mutually-ordered |
| * B reads a value written by A or by an operation in the release sequence |
| headed by A |
| |
| code:OpControlBarrier, code:OpMemoryBarrier, and code:OpMemoryNamedBarrier |
| are _memory barrier_ instructions in SPIR-V. |
| |
| If A is a release barrier and B is an atomic operation that performs an |
| acquire operation, then A synchronizes-with B if and only if all of the |
| following are true: |
| |
| * there exists an atomic write X (with any memory semantics) |
| * A is program-ordered before X |
| * X and B are mutually-ordered |
| * B reads a value written by X or by an operation in the release sequence |
| headed by X |
| ** If X is relaxed, it is still considered to head a hypothetical release |
| sequence for this rule |
| * A and B are in the instance of each other's memory scopes |
| * X's storage class is in A's semantics. |
| |
| If A is an atomic operation that performs a release operation and B is an |
| acquire barrier, then A synchronizes-with B if and only if all of the |
| following are true: |
| |
| * there exists an atomic read X (with any memory semantics) |
| * X is program-ordered before B |
| * X and A are mutually-ordered |
| * X reads a value written by A or by an operation in the release sequence |
| headed by A |
| * A and B are in the instance of each other's memory scopes |
| * X's storage class is in B's semantics. |
| |
| If A is a release barrier and B is an acquire barrier, then A |
| synchronizes-with B if all of the following are true: |
| |
| * there exists an atomic write X (with any memory semantics) |
| * A is program-ordered before X |
| * there exists an atomic read Y (with any memory semantics) |
| * Y is program-ordered before B |
| * X and Y are mutually-ordered |
| * Y reads the value written by X or by an operation in the release |
| sequence headed by X |
| ** If X is relaxed, it is still considered to head a hypothetical release |
| sequence for this rule |
| * A and B are in the instance of each other's memory scopes |
| * X's and Y's storage class is in A's and B's semantics. |
| ** NOTE: X and Y must have the same storage class, because they are |
| mutually ordered. |
| |
| If A is a release barrier, B is an acquire barrier, and C is a control |
| barrier (where A can: equal C, and B can: equal C), then A synchronizes-with |
| B if all of the following are true: |
| |
| * A is program-ordered before (or equals) C |
| * C is program-ordered before (or equals) B |
| * A and B are in the instance of each other's memory scopes |
| * A and B are in the instance of C's execution scope |
| |
| [NOTE] |
| .Note |
| ==== |
| This is similar to the barrier-barrier synchronization above, but with a |
| control barrier filling the role of the relaxed atomics. |
| ==== |
| |
| ifdef::VK_EXT_fragment_shader_interlock[] |
| |
| Let F be an ordering of fragment shader invocations, such that invocation |
| F~1~ is ordered before invocation F~2~ if and only if F~1~ and F~2~ overlap |
| as described in <<shaders-scope-fragment-interlock,Fragment Shader |
| Interlock>> and F~1~ executes the interlocked code before F~2~. |
| |
| If A is an code:OpEndInvocationInterlockEXT instruction and B is an |
| code:OpBeginInvocationInterlockEXT instruction, then A synchronizes-with B |
| if the agent that executes A is ordered before the agent that executes B in |
| F. A and B are both considered to have code:FragmentInterlock memory scope |
| and semantics of UniformMemory and ImageMemory, and A is considered to have |
| Release semantics and B is considered to have Acquire semantics. |
| |
| [NOTE] |
| .Note |
| ==== |
| code:OpBeginInvocationInterlockEXT and code:OpBeginInvocationInterlockEXT do |
| not perform implicit availability or visibility operations. |
| Usually, shaders using fragment shader interlock will declare the relevant |
| resources as `coherent` to get implicit |
| <<memory-model-instruction-av-vis,per-instruction availability and |
| visibility operations>>. |
| ==== |
| |
| endif::VK_EXT_fragment_shader_interlock[] |
| |
| ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] |
| If A is a release barrier and B is an acquire barrier, then A |
| synchronizes-with B if all of the following are true: |
| |
| * A is shader-call-ordered-before B |
| * A and B are in the instance of each other's memory scopes |
| |
| endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] |
| |
| No other release and acquire barriers synchronize-with each other. |
| |
| |
| [[memory-model-system-synchronizes-with]] |
| == System-Synchronizes-With |
| |
| _System-synchronizes-with_ is a relation between arbitrary operations on the |
| device or host. |
| Certain operations system-synchronize-with each other, which informally |
| means the first operation occurs before the second and that the |
| synchronization is performed without using application-visible memory |
| accesses. |
| |
| If there is an <<synchronization-dependencies-execution,execution |
| dependency>> between two operations A and B, then the operation in the first |
| synchronization scope system-synchronizes-with the operation in the second |
| synchronization scope. |
| |
| [NOTE] |
| .Note |
| ==== |
| This covers all Vulkan synchronization primitives, including device |
| operations executing before a synchronization primitive is signaled, wait |
| operations happening before subsequent device operations, signal operations |
| happening before host operations that wait on them, and host operations |
| happening before flink:vkQueueSubmit. |
| The list is spread throughout the synchronization chapter, and is not |
| repeated here. |
| ==== |
| |
| System-synchronizes-with implicitly includes all storage class semantics and |
| has code:CrossDevice scope. |
| |
| If A system-synchronizes-with B, we also say A is |
| _system-synchronized-before_ B and B is _system-synchronized-after_ A. |
| |
| |
| [[memory-model-non-private]] |
| == Private vs. Non-Private |
| |
| By default, non-atomic memory operations are treated as _private_, meaning |
| such a memory operation is not intended to be used for communication with |
| other agents. |
| Memory operations with the NonPrivatePointer/NonPrivateTexel bit set are |
| treated as _non-private_, and are intended to be used for communication with |
| other agents. |
| |
| More precisely, for private memory operations to be |
| <<memory-model-location-ordered,Location-Ordered>> between distinct agents |
| requires using system-synchronizes-with rather than shader-based |
| synchronization. |
| Private memory operations still obey program-order. |
| |
| Atomic operations are always considered non-private. |
| |
| |
| [[memory-model-inter-thread-happens-before]] |
| == Inter-Thread-Happens-Before |
| |
| Let SC be a non-empty set of storage class semantics. |
| Then (using template syntax) operation A _inter-thread-happens-before_<SC> |
| operation B if and only if any of the following is true: |
| |
| * A system-synchronizes-with B |
| * A synchronizes-with B, and both A and B have all of SC in their |
| semantics |
| * A is an operation on memory in a storage class in SC or that has all of |
| SC in its semantics, B is a release barrier or release atomic with all |
| of SC in its semantics, and A is program-ordered before B |
| * A is an acquire barrier or acquire atomic with all of SC in its |
| semantics, B is an operation on memory in a storage class in SC or that |
| has all of SC in its semantics, and A is program-ordered before B |
| * A and B are both host operations and A inter-thread-happens-before B as |
| defined in the host language specification |
| * A inter-thread-happens-before<SC> some X and X |
| inter-thread-happens-before<SC> B |
| |
| |
| [[memory-model-happens-before]] |
| == Happens-Before |
| |
| Operation A _happens-before_ operation B if and only if any of the following |
| is true: |
| |
| * A is program-ordered before B |
| * A inter-thread-happens-before<SC> B for some set of storage classes SC |
| |
| _Happens-after_ is defined similarly. |
| |
| [NOTE] |
| .Note |
| ==== |
| Unlike C++, happens-before is not always sufficient for a write to be |
| visible to a read. |
| Additional <<memory-model-availability-visibility,availability and |
| visibility>> operations may: be required for writes to be |
| <<memory-model-visible-to,visible-to>> other memory accesses. |
| ==== |
| |
| [NOTE] |
| .Note |
| ==== |
| Happens-before is not transitive, but each of program-order and |
| inter-thread-happens-before<SC> are transitive. |
| These can be thought of as covering the "`single-threaded`" case and the |
| "`multi-threaded`" case, and it is not necessary (and not valid) to form |
| chains between the two. |
| ==== |
| |
| |
| [[memory-model-availability-visibility]] |
| == Availability and Visibility |
| |
| _Availability_ and _visibility_ are states of a write operation, which |
| (informally) track how far the write has permeated the system, i.e. which |
| agents and references are able to observe the write. |
| Availability state is per _memory domain_. |
| Visibility state is per (agent,reference) pair. |
| Availability and visibility states are per-memory location for each write. |
| |
| Memory domains are named according to the agents whose memory accesses use |
| the domain. |
| Domains used by shader invocations are organized hierarchically into |
| multiple smaller memory domains which correspond to the different |
| <<shaders-scope, scopes>>. |
| Each memory domain is considered the _dual_ of a scope, and vice versa. |
| The memory domains defined in Vulkan include: |
| |
| * _host_ - accessible by host agents |
| * _device_ - accessible by all device agents for a particular device |
| * _shader_ - accessible by shader agents for a particular device, |
| corresponding to the code:Device scope |
| * _queue family instance_ - accessible by shader agents in a single queue |
| family, corresponding to the code:QueueFamily scope. |
| ifdef::VK_EXT_fragment_shader_interlock[] |
| * _fragment interlock instance_ - accessible by fragment shader agents |
| that <<shaders-scope-fragment-interlock,overlap>>, corresponding to the |
| code:FragmentInterlock scope. |
| endif::VK_EXT_fragment_shader_interlock[] |
| ifdef::VK_KHR_ray_tracing_pipeline[] |
| * _shader call instance_ - accessible by shader agents that are |
| <<shader-call-related,shader-call-related>>, corresponding to the |
| code:ShaderCallKHR scope. |
| endif::VK_KHR_ray_tracing_pipeline[] |
| * _workgroup instance_ - accessible by shader agents in the same |
| workgroup, corresponding to the code:Workgroup scope. |
| * _subgroup instance_ - accessible by shader agents in the same subgroup, |
| corresponding to the code:Subgroup scope. |
| |
| The memory domains are nested in the order listed above, |
| ifdef::VK_KHR_ray_tracing_pipeline[] |
| except for shader call instance domain, |
| endif::VK_KHR_ray_tracing_pipeline[] |
| with memory domains later in the list nested in the domains earlier in the |
| list. |
| ifdef::VK_KHR_ray_tracing_pipeline[] |
| The shader call instance domain is at an implementation-dependent location |
| in the list, and is nested according to that location. |
| The shader call instance domain is not broader than the queue family |
| instance domain. |
| endif::VK_KHR_ray_tracing_pipeline[] |
| |
| [NOTE] |
| .Note |
| ==== |
| Memory domains do not correspond to storage classes or device-local and |
| host-local slink:VkDeviceMemory allocations, rather they indicate whether a |
| write can be made visible only to agents in the same subgroup, same |
| workgroup, |
| ifdef::VK_EXT_fragment_shader_interlock[] |
| overlapping fragment shader invocation, |
| endif::VK_EXT_fragment_shader_interlock[] |
| ifdef::VK_KHR_ray_tracing_pipeline[] |
| shader-call-related ray tracing invocation, |
| endif::VK_KHR_ray_tracing_pipeline[] |
| in any shader invocation, or anywhere on the device, or host. |
| The shader, queue family instance, |
| ifdef::VK_EXT_fragment_shader_interlock[] |
| fragment interlock instance, |
| endif::VK_EXT_fragment_shader_interlock[] |
| ifdef::VK_KHR_ray_tracing_pipeline[] |
| shader call instance, |
| endif::VK_KHR_ray_tracing_pipeline[] |
| workgroup instance, and subgroup instance domains are only used for |
| shader-based availability/visibility operations, in other cases writes can |
| be made available from/visible to the shader via the device domain. |
| ==== |
| |
| _Availability operations_, _visibility operations_, and _memory domain |
| operations_ alter the state of the write operations that happen-before them, |
| and which are included in their _source scope_ to be available or visible to |
| their _destination scope_. |
| |
| * For an availability operation, the source scope is a set of |
| (agent,reference,memory location) tuples, and the destination scope is a |
| set of memory domains. |
| * For a memory domain operation, the source scope is a memory domain and |
| the destination scope is a memory domain. |
| * For a visibility operation, the source scope is a set of memory domains |
| and the destination scope is a set of (agent,reference,memory location) |
| tuples. |
| |
| How the scopes are determined depends on the specific operation. |
| Availability and memory domain operations expand the set of memory domains |
| to which the write is available. |
| Visibility operations expand the set of (agent,reference,memory location) |
| tuples to which the write is visible. |
| |
| Recall that availability and visibility states are per-memory location, and |
| let W be a write operation to one or more locations performed by agent A via |
| reference R. Let L be one of the locations written. |
| (W,L) (the write W to L), is initially not available to any memory domain |
| and only visible to (A,R,L). |
| An availability operation AV that happens-after W and that includes (A,R,L) |
| in its source scope makes (W,L) _available_ to the memory domains in its |
| destination scope. |
| |
| A memory domain operation DOM that happens-after AV and for which (W,L) is |
| available in the source scope makes (W,L) available in the destination |
| memory domain. |
| |
| A visibility operation VIS that happens-after AV (or DOM) and for which |
| (W,L) is available in any domain in the source scope makes (W,L) _visible_ |
| to all (agent,reference,L) tuples included in its destination scope. |
| |
| If write W~2~ happens-after W, and their sets of memory locations overlap, |
| then W will not be available/visible to all agents/references for those |
| memory locations that overlap (and future AV/DOM/VIS ops cannot revive W's |
| write to those locations). |
| |
| Availability, memory domain, and visibility operations are treated like |
| other non-atomic memory accesses for the purpose of |
| <<memory-model-memory-semantics,memory semantics>>, meaning they can be |
| ordered by release-acquire sequences or memory barriers. |
| |
| An _availability chain_ is a sequence of availability operations to |
| increasingly broad memory domains, where element N+1 of the chain is |
| performed in the dual scope instance of the destination memory domain of |
| element N and element N happens-before element N+1. |
| An example is an availability operation with destination scope of the |
| workgroup instance domain that happens-before an availability operation to |
| the shader domain performed by an invocation in the same workgroup. |
| An availability chain AVC that happens-after W and that includes (A,R,L) in |
| the source scope makes (W,L) _available_ to the memory domains in its final |
| destination scope. |
| An availability chain with a single element is just the availability |
| operation. |
| |
| Similarly, a _visibility chain_ is a sequence of visibility operations from |
| increasingly narrow memory domains, where element N of the chain is |
| performed in the dual scope instance of the source memory domain of element |
| N+1 and element N happens-before element N+1. |
| An example is a visibility operation with source scope of the shader domain |
| that happens-before a visibility operation with source scope of the |
| workgroup instance domain performed by an invocation in the same workgroup. |
| A visibility chain VISC that happens-after AVC (or DOM) and for which (W,L) |
| is available in any domain in the source scope makes (W,L) _visible_ to all |
| (agent,reference,L) tuples included in its final destination scope. |
| A visibility chain with a single element is just the visibility operation. |
| |
| |
| [[memory-model-vulkan-availability-visibility]] |
| == Availability, Visibility, and Domain Operations |
| |
| The following operations generate availability, visibility, and domain |
| operations. |
| When multiple availability/visibility/domain operations are described, they |
| are system-synchronized-with each other in the order listed. |
| |
| An operation that performs a <<synchronization-dependencies-memory,memory |
| dependency>> generates: |
| |
| * If the source access mask includes ename:VK_ACCESS_HOST_WRITE_BIT, then |
| the dependency includes a memory domain operation from host domain to |
| device domain. |
| * An availability operation with source scope of all writes in the first |
| <<synchronization-dependencies-access-scopes,access scope>> of the |
| dependency and a destination scope of the device domain. |
| * A visibility operation with source scope of the device domain and |
| destination scope of the second access scope of the dependency. |
| * If the destination access mask includes ename:VK_ACCESS_HOST_READ_BIT or |
| ename:VK_ACCESS_HOST_WRITE_BIT, then the dependency includes a memory |
| domain operation from device domain to host domain. |
| |
| flink:vkFlushMappedMemoryRanges performs an availability operation, with a |
| source scope of (agents,references) = (all host threads, all mapped memory |
| ranges passed to the command), and destination scope of the host domain. |
| |
| flink:vkInvalidateMappedMemoryRanges performs a visibility operation, with a |
| source scope of the host domain and a destination scope of |
| (agents,references) = (all host threads, all mapped memory ranges passed to |
| the command). |
| |
| flink:vkQueueSubmit performs a memory domain operation from host to device, |
| and a visibility operation with source scope of the device domain and |
| destination scope of all agents and references on the device. |
| |
| |
| [[memory-model-availability-visibility-semantics]] |
| == Availability and Visibility Semantics |
| |
| A memory barrier or atomic operation via agent A that includes MakeAvailable |
| in its semantics performs an availability operation whose source scope |
| includes agent A and all references in the storage classes in that |
| instruction's storage class semantics, and all memory locations, and whose |
| destination scope is a set of memory domains selected as specified below. |
| The implicit availability operation is program-ordered between the barrier |
| or atomic and all other operations program-ordered before the barrier or |
| atomic. |
| |
| A memory barrier or atomic operation via agent A that includes MakeVisible |
| in its semantics performs a visibility operation whose source scope is a set |
| of memory domains selected as specified below, and whose destination scope |
| includes agent A and all references in the storage classes in that |
| instruction's storage class semantics, and all memory locations. |
| The implicit visibility operation is program-ordered between the barrier or |
| atomic and all other operations program-ordered after the barrier or atomic. |
| |
| The memory domains are selected based on the memory scope of the instruction |
| as follows: |
| |
| * code:Device scope uses the shader domain |
| * code:QueueFamily scope uses the queue family instance domain |
| ifdef::VK_EXT_fragment_shader_interlock[] |
| * code:FragmentInterlock scope uses the fragment interlock instance domain |
| endif::VK_EXT_fragment_shader_interlock[] |
| ifdef::VK_KHR_ray_tracing_pipeline[] |
| * code:ShaderCallKHR scope uses the shader call instance domain |
| endif::VK_KHR_ray_tracing_pipeline[] |
| * code:Workgroup scope uses the workgroup instance domain |
| * code:Subgroup uses the subgroup instance domain |
| * code:Invocation perform no availability/visibility operations. |
| |
| When an availability operation performed by an agent A includes a memory |
| domain D in its destination scope, where D corresponds to scope instance S, |
| it also includes the memory domains that correspond to each smaller scope |
| instance S' that is a subset of S and that includes A. Similarly for |
| visibility operations. |
| |
| |
| [[memory-model-instruction-av-vis]] |
| == Per-Instruction Availability and Visibility Semantics |
| |
| A memory write instruction that includes MakePointerAvailable, or an image |
| write instruction that includes MakeTexelAvailable, performs an availability |
| operation whose source scope includes the agent and reference used to |
| perform the write and the memory locations written by the instruction, and |
| whose destination scope is a set of memory domains selected by the Scope |
| operand specified in <<memory-model-availability-visibility-semantics, |
| Availability and Visibility Semantics>>. |
| The implicit availability operation is program-ordered between the write and |
| all other operations program-ordered after the write. |
| |
| A memory read instruction that includes MakePointerVisible, or an image read |
| instruction that includes MakeTexelVisible, performs a visibility operation |
| whose source scope is a set of memory domains selected by the Scope operand |
| as specified in <<memory-model-availability-visibility-semantics, |
| Availability and Visibility Semantics>>, and whose destination scope |
| includes the agent and reference used to perform the read and the memory |
| locations read by the instruction. |
| The implicit visibility operation is program-ordered between read and all |
| other operations program-ordered before the read. |
| |
| [NOTE] |
| .Note |
| ==== |
| Although reads with per-instruction visibility only perform visibility ops |
| from the shader or |
| ifdef::VK_EXT_fragment_shader_interlock[] |
| fragment interlock instance or |
| endif::VK_EXT_fragment_shader_interlock[] |
| ifdef::VK_KHR_ray_tracing_pipeline[] |
| shader call instance or |
| endif::VK_KHR_ray_tracing_pipeline[] |
| workgroup instance or subgroup instance domain, they will also see writes |
| that were made visible via the device domain, i.e. those writes previously |
| performed by non-shader agents and made visible via API commands. |
| ==== |
| |
| [NOTE] |
| .Note |
| ==== |
| It is expected that all invocations in a subgroup execute on the same |
| processor with the same path to memory, and thus availability and visibility |
| operations with subgroup scope can be expected to be "`free`". |
| ==== |
| |
| |
| [[memory-model-location-ordered]] |
| == Location-Ordered |
| |
| Let X and Y be memory accesses to overlapping sets of memory locations M, |
| where X != Y. Let (A~X~,R~X~) be the agent and reference used for X, and |
| (A~Y~,R~Y~) be the agent and reference used for Y. For now, let "`->`" |
| denote happens-before and "`->^rcpo^`" denote the reflexive closure of |
| program-ordered before. |
| |
| If D~1~ and D~2~ are different memory domains, then let DOM(D~1~,D~2~) be a |
| memory domain operation from D~1~ to D~2~. |
| Otherwise, let DOM(D,D) be a placeholder such that X->DOM(D,D)->Y if and |
| only if X->Y. |
| |
| X is _location-ordered_ before Y for a location L in M if and only if any of |
| the following is true: |
| |
| * A~X~ == A~Y~ and R~X~ == R~Y~ and X->Y |
| ** NOTE: this case means no availability/visibility ops are required when |
| it is the same (agent,reference). |
| |
| * X is a read, both X and Y are non-private, and X->Y |
| * X is a read, and X (transitively) system-synchronizes with Y |
| |
| * If R~X~ == R~Y~ and A~X~ and A~Y~ access a common memory domain D (e.g. |
| are in the same workgroup instance if D is the workgroup instance |
| domain), and both X and Y are non-private: |
| ** X is a write, Y is a write, AVC(A~X~,R~X~,D,L) is an availability chain |
| making (X,L) available to domain D, and X->^rcpo^AVC(A~X~,R~X~,D,L)->Y |
| ** X is a write, Y is a read, AVC(A~X~,R~X~,D,L) is an availability chain |
| making (X,L) available to domain D, VISC(A~Y~,R~Y~,D,L) is a visibility |
| chain making writes to L available in domain D visible to Y, and |
| X->^rcpo^AVC(A~X~,R~X~,D,L)->VISC(A~Y~,R~Y~,D,L)->^rcpo^Y |
| ** If |
| slink:VkPhysicalDeviceVulkanMemoryModelFeatures::pname:vulkanMemoryModelAvailabilityVisibilityChains |
| is ename:VK_FALSE, then AVC and VISC must: each only have a single |
| element in the chain, in each sub-bullet above. |
| |
| * Let D~X~ and D~Y~ each be either the device domain or the host domain, |
| depending on whether A~X~ and A~Y~ execute on the device or host: |
| ** X is a write and Y is a write, and |
| X->AV(A~X~,R~X~,D~X~,L)->DOM(D~X~,D~Y~)->Y |
| ** X is a write and Y is a read, and |
| X->AV(A~X~,R~X~,D~X~,L)->DOM(D~X~,D~Y~)->VIS(A~Y~,R~Y~,D~Y~,L)->Y |
| |
| [NOTE] |
| .Note |
| ==== |
| The final bullet (synchronization through device/host domain) requires |
| API-level synchronization operations, since the device/host domains are not |
| accessible via shader instructions. |
| And "`device domain`" is not to be confused with "`device scope`", which |
| synchronizes through the "`shader domain`". |
| ==== |
| |
| |
| [[memory-model-access-data-race]] |
| == Data Race |
| |
| Let X and Y be operations that access overlapping sets of memory locations |
| M, where X != Y, and at least one of X and Y is a write, and X and Y are not |
| mutually-ordered atomic operations. |
| If there does not exist a location-ordered relation between X and Y for each |
| location in M, then there is a _data race_. |
| |
| Applications must: ensure that no data races occur during the execution of |
| their application. |
| |
| [NOTE] |
| .Note |
| ==== |
| Data races can only occur due to instructions that are actually executed. |
| For example, an instruction skipped due to control flow must not contribute |
| to a data race. |
| ==== |
| |
| |
| [[memory-model-visible-to]] |
| == Visible-To |
| |
| Let X be a write and Y be a read whose sets of memory locations overlap, and |
| let M be the set of memory locations that overlap. |
| Let M~2~ be a non-empty subset of M. Then X is _visible-to_ Y for memory |
| locations M~2~ if and only if all of the following are true: |
| |
| * X is location-ordered before Y for each location L in M~2~. |
| * There does not exist another write Z to any location L in M~2~ such that |
| X is location-ordered before Z for location L and Z is location-ordered |
| before Y for location L. |
| |
| If X is visible-to Y, then Y reads the value written by X for locations |
| M~2~. |
| |
| [NOTE] |
| .Note |
| ==== |
| It is possible for there to be a write between X and Y that overwrites a |
| subset of the memory locations, but the remaining memory locations (M~2~) |
| will still be visible-to Y. |
| ==== |
| |
| |
| [[memory-model-acyclicity]] |
| == Acyclicity |
| |
| _Reads-from_ is a relation between operations, where the first operation is |
| a write, the second operation is a read, and the second operation reads the |
| value written by the first operation. |
| _From-reads_ is a relation between operations, where the first operation is |
| a read, the second operation is a write, and the first operation reads a |
| value written earlier than the second operation in the second operation's |
| scoped modification order (or the first operation reads from the initial |
| value, and the second operation is any write to the same locations). |
| |
| Then the implementation must: guarantee that no cycles exist in the union of |
| the following relations: |
| |
| * location-ordered |
| * scoped modification order (over all atomic writes) |
| * reads-from |
| * from-reads |
| |
| [NOTE] |
| .Note |
| ==== |
| This is a "`consistency`" axiom, which informally guarantees that sequences |
| of operations cannot violate causality. |
| ==== |
| |
| |
| [[memory-model-scoped-modification-order-coherence]] |
| === Scoped Modification Order Coherence |
| |
| Let A and B be mutually-ordered atomic operations, where A is |
| location-ordered before B. Then the following rules are a consequence of |
| acyclicity: |
| |
| * If A and B are both reads and A does not read the initial value, then |
| the write that A takes its value from must: be earlier in its own scoped |
| modification order than (or the same as) the write that B takes its |
| value from (no cycles between location-order, reads-from, and |
| from-reads). |
| * If A is a read and B is a write and A does not read the initial value, |
| then A must: take its value from a write earlier than B in B's scoped |
| modification order (no cycles between location-order, scope modification |
| order, and reads-from). |
| * If A is a write and B is a read, then B must: take its value from A or a |
| write later than A in A's scoped modification order (no cycles between |
| location-order, scoped modification order, and from-reads). |
| * If A and B are both writes, then A must: be earlier than B in A's scoped |
| modification order (no cycles between location-order and scoped |
| modification order). |
| * If A is a write and B is a read-modify-write and B reads the value |
| written by A, then B comes immediately after A in A's scoped |
| modification order (no cycles between scoped modification order and |
| from-reads). |
| |
| |
| [[memory-model-shader-io]] |
| == Shader I/O |
| |
| If a shader invocation A in a shader stage other than code:Vertex performs a |
| memory read operation X from an object in storage class |
| ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] |
| code:CallableDataKHR, code:IncomingCallableDataKHR, code:RayPayloadKHR, |
| code:HitAttributeKHR, code:IncomingRayPayloadKHR, or |
| endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] |
| code:Input, then X is system-synchronized-after all writes to the |
| corresponding |
| ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] |
| code:CallableDataKHR, code:IncomingCallableDataKHR, code:RayPayloadKHR, |
| code:HitAttributeKHR, code:IncomingRayPayloadKHR, or |
| endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] |
| code:Output storage variable(s) in the shader invocation(s) that contribute |
| to generating invocation A, and those writes are all visible-to X. |
| |
| [NOTE] |
| .Note |
| ==== |
| It is not necessary for the upstream shader invocations to have completed |
| execution, they only need to have generated the output that is being read. |
| ==== |
| |
| |
| [[memory-model-deallocation]] |
| == Deallocation |
| |
| ifndef::VKSC_VERSION_1_0[] |
| |
| A call to flink:vkFreeMemory must: happen-after all memory operations on all |
| memory locations in that slink:VkDeviceMemory object. |
| |
| [NOTE] |
| .Note |
| ==== |
| Normally, device memory operations in a given queue are synchronized with |
| flink:vkFreeMemory by having a host thread wait on a fence signaled by that |
| queue, and the wait happens-before the call to flink:vkFreeMemory on the |
| host. |
| ==== |
| |
| endif::VKSC_VERSION_1_0[] |
| |
| The deallocation of SPIR-V variables is managed by the system and |
| happens-after all operations on those variables. |
| |
| |
| [[memory-model-informative-descriptions]] |
| == Descriptions (Informative) |
| |
| This subsection offers more easily understandable consequences of the memory |
| model for app/compiler developers. |
| |
| Let SC be the storage class(es) specified by a release or acquire operation |
| or barrier. |
| |
| * An atomic write with release semantics must not be reordered against any |
| read or write to SC that is program-ordered before it (regardless of the |
| storage class the atomic is in). |
| |
| * An atomic read with acquire semantics must not be reordered against any |
| read or write to SC that is program-ordered after it (regardless of the |
| storage class the atomic is in). |
| |
| * Any write to SC program-ordered after a release barrier must not be |
| reordered against any read or write to SC program-ordered before that |
| barrier. |
| |
| * Any read from SC program-ordered before an acquire barrier must not be |
| reordered against any read or write to SC program-ordered after the |
| barrier. |
| |
| A control barrier (even if it has no memory semantics) must not be reordered |
| against any memory barriers. |
| |
| This memory model allows memory accesses with and without availability and |
| visibility operations, as well as atomic operations, all to be performed on |
| the same memory location. |
| This is critical to allow it to reason about memory that is reused in |
| multiple ways, e.g. across the lifetime of different shader invocations or |
| draw calls. |
| While GLSL (and legacy SPIR-V) applies the "`coherent`" decoration to |
| variables (for historical reasons), this model treats each memory access |
| instruction as having optional implicit availability/visibility operations. |
| GLSL to SPIR-V compilers should map all (non-atomic) operations on a |
| coherent variable to Make{Pointer,Texel}\{Available}\{Visible} flags in this |
| model. |
| |
| Atomic operations implicitly have availability/visibility operations, and |
| the scope of those operations is taken from the atomic operation's scope. |
| |
| |
| [[memory-model-tessellation-output-ordering]] |
| == Tessellation Output Ordering |
| |
| For SPIR-V that uses the Vulkan Memory Model, the code:OutputMemory storage |
| class is used to synchronize accesses to tessellation control output |
| variables. |
| For legacy SPIR-V that does not enable the Vulkan Memory Model via |
| code:OpMemoryModel, tessellation outputs can be ordered using a control |
| barrier with no particular memory scope or semantics, as defined below. |
| |
| Let X and Y be memory operations performed by shader invocations A~X~ and |
| A~Y~. |
| Operation X is _tessellation-output-ordered_ before operation Y if and only |
| if all of the following are true: |
| |
| * There is a dynamic instance of an code:OpControlBarrier instruction C |
| such that X is program-ordered before C in A~X~ and C is program-ordered |
| before Y in A~Y~. |
| * A~X~ and A~Y~ are in the same instance of C's execution scope. |
| |
| If shader invocations A~X~ and A~Y~ in the code:TessellationControl |
| execution model execute memory operations X and Y, respectively, on the |
| code:Output storage class, and X is tessellation-output-ordered before Y |
| with a scope of code:Workgroup, then X is location-ordered before Y, and if |
| X is a write and Y is a read then X is visible-to Y. |
| |
| |
| ifdef::VK_NV_cooperative_matrix[] |
| [[memory-model-cooperative-matrix]] |
| == Cooperative Matrix Memory Access |
| |
| For each dynamic instance of a cooperative matrix load or store instruction |
| (code:OpCooperativeMatrixLoadNV or code:OpCooperativeMatrixStoreNV), a |
| single implementation-dependent invocation within the instance of the |
| matrix's scope performs a non-atomic load or store (respectively) to each |
| memory location that is defined to be accessed by the instruction. |
| endif::VK_NV_cooperative_matrix[] |