| # SPIR-V Assembly language syntax |
| |
| ## Overview |
| |
| The assembly attempts to adhere to the binary form from Section 3 of the SPIR-V |
| spec as closely as possible, with one exception aiming at improving the text's |
| readability. The `<result-id>` generated by an instruction is moved to the |
| beginning of that instruction and followed by an `=` sign. This allows us to |
| distinguish between variable definitions and uses and locate value definitions |
| more easily. |
| |
| Here is an example: |
| |
| ``` |
| OpCapability Shader |
| OpMemoryModel Logical Simple |
| OpEntryPoint GLCompute %3 "main" |
| OpExecutionMode %3 LocalSize 64 64 1 |
| %1 = OpTypeVoid |
| %2 = OpTypeFunction %1 |
| %3 = OpFunction %1 None %2 |
| %4 = OpLabel |
| OpReturn |
| OpFunctionEnd |
| ``` |
| |
| A module is a sequence of instructions, separated by whitespace. |
| An instruction is an opcode name followed by operands, separated by |
| whitespace. Typically each instruction is presented on its own line, |
| but the assembler does not enforce this rule. |
| |
| The opcode names and expected operands are described in Section 3 of |
| the SPIR-V specification. An operand is one of: |
| * a literal integer: A decimal integer, or a hexadecimal integer. |
| A hexadecimal integer is indicated by a leading `0x` or `0X`. A hex |
| integer supplied for a signed integer value will be sign-extended. |
| For example, `0xffff` supplied as the literal for an `OpConstant` |
| on a signed 16-bit integer type will be interpreted as the value `-1`. |
| * a literal floating point number, in decimal or hexadecimal form. |
| See [below](#floats). |
| * a literal string. |
| * A literal string is everything following a double-quote `"` until the |
| following un-escaped double-quote. This includes special characters such |
| as newlines. |
| * A backslash `\` may be used to escape characters in the string. The `\` |
| may be used to escape a double-quote or a `\` but is simply ignored when |
| preceding any other character. |
| * a named enumerated value, specific to that operand position. For example, |
| the `OpMemoryModel` takes a named Addressing Model operand (e.g. `Logical` or |
| `Physical32`), and a named Memory Model operand (e.g. `Simple` or `OpenCL`). |
| Named enumerated values are only meaningful in specific positions, and will |
| otherwise generate an error. |
| * a mask expression, consisting of one or more mask enum names separated |
| by `|`. For example, the expression `NotNaN|NotInf|NSZ` denotes the mask |
| which is the combination of the `NotNaN`, `NotInf`, and `NSZ` flags. |
| * an injected immediate integer: `!<integer>`. See [below](#immediate). |
| * an ID, e.g. `%foo`. See [below](#id). |
| * the name of an extended instruction. For example, `sqrt` in an extended |
| instruction such as `%f = OpExtInst %f32 %OpenCLImport sqrt %arg` |
| * the name of an opcode for OpSpecConstantOp, but where the `Op` prefix |
| is removed. For example, the following indicates the use of an integer |
| addition in a specialization constant computation: |
| `%sum = OpSpecConstantOp %i32 IAdd %a %b` |
| |
| ## ID Definitions & Usage |
| <a name="id"></a> |
| |
| An ID _definition_ pertains to the `<result-id>` of an instruction, and ID |
| _usage_ is a use of an ID as an input to an instruction. |
| |
| An ID in the assembly language begins with `%` and must be followed by a name |
| consisting of one or more letters, numbers or underscore characters. |
| |
| For every ID in the assembly program, the assembler generates a unique number |
| called the ID's internal number. Then each ID reference translates into its |
| internal number in the SPIR-V output. Internal numbers are unique within the |
| compilation unit: no two IDs in the same unit will share internal numbers. |
| |
| The disassembler generates IDs where the name is always a decimal number |
| greater than 0. |
| |
| So the example can be rewritten using more user-friendly names, as follows: |
| ``` |
| OpCapability Shader |
| OpMemoryModel Logical Simple |
| OpEntryPoint GLCompute %main "main" |
| OpExecutionMode %main LocalSize 64 64 1 |
| %void = OpTypeVoid |
| %fnMain = OpTypeFunction %void |
| %main = OpFunction %void None %fnMain |
| %lbMain = OpLabel |
| OpReturn |
| OpFunctionEnd |
| ``` |
| |
| ## Floating point literals |
| <a name="floats"></a> |
| |
| The assembler and disassembler support floating point literals in both |
| decimal and hexadecimal form. |
| |
| The syntax for a floating point literal is the same as floating point |
| constants in the C programming language, except: |
| * An optional leading minus (`-`) is part of the literal. |
| * An optional type specifier suffix is not allowed. |
| Infinity and NaN values are expressed in hexadecimal float literals |
| by using the maximum representable exponent for the bit width. |
| |
| For example, in 32-bit floating point, 8 bits are used for the exponent, and the |
| exponent bias is 127. So the maximum representable unbiased exponent is 128. |
| Therefore, we represent the infinities and some NaNs as follows: |
| |
| ``` |
| %float32 = OpTypeFloat 32 |
| %inf = OpConstant %float32 0x1p+128 |
| %neginf = OpConstant %float32 -0x1p+128 |
| %aNaN = OpConstant %float32 0x1.8p+128 |
| %moreNaN = OpConstant %float32 -0x1.0002p+128 |
| ``` |
| The assembler preserves all the bits of a NaN value. For example, the encoding |
| of `%aNaN` in the previous example is the same as the word with bits |
| `0x7fc00000`, and `%moreNaN` is encoded as `0xff800100`. |
| |
| The disassembler prints infinite, NaN, and subnormal values in hexadecimal form. |
| Zero and normal values are printed in decimal form with enough digits |
| to preserve all significand bits. |
| |
| ## Arbitrary Integers |
| <a name="immediate"></a> |
| |
| When writing tests it can be useful to emit an invalid 32 bit word into the |
| binary stream at arbitrary positions within the assembly. To specify an |
| arbitrary word into the stream the prefix `!` is used, this takes the form |
| `!<integer>`. Here is an example. |
| |
| ``` |
| OpCapability !0x0000FF00 |
| ``` |
| |
| Any token in a valid assembly program may be replaced by `!<integer>` -- even |
| tokens that dictate how the rest of the instruction is parsed. Consider, for |
| example, the following assembly program: |
| |
| ``` |
| %4 = OpConstant %1 123 456 789 OpExecutionMode %2 LocalSize 11 22 33 |
| OpExecutionMode %3 InputLines |
| ``` |
| |
| The tokens `OpConstant`, `LocalSize`, and `InputLines` may be replaced by random |
| `!<integer>` values, and the assembler will still assemble an output binary with |
| three instructions. It will not necessarily be valid SPIR-V, but it will |
| faithfully reflect the input text. |
| |
| You may wonder how the assembler recognizes the instruction structure (including |
| instruction boundaries) in the text with certain crucial tokens replaced by |
| arbitrary integers. If, say, `OpConstant` becomes a `!<integer>` whose value |
| differs from the binary representation of `OpConstant` (remember that this |
| feature is intended for fine-grain control in SPIR-V testing), the assembler |
| generally has no idea what that value stands for. So how does it know there is |
| exactly one `<id>` and three number literals following in that instruction, |
| before the next one begins? And if `LocalSize` is replaced by an arbitrary |
| `!<integer>`, how does it know to take the next three tokens (instead of zero or |
| one, both of which are possible in the absence of certainty that `LocalSize` |
| provided)? The answer is a simple rule governing the parsing of instructions |
| with `!<integer>` in them: |
| |
| When a token in the assembly program is a `!<integer>`, that integer value is |
| emitted into the binary output, and parsing proceeds differently than before: |
| each subsequent token not recognized as an OpCode or a `<result-id>` is emitted |
| into the binary output without any checking; when a recognizable OpCode or a |
| `<result-id>` is eventually encountered, it begins a new instruction and parsing |
| returns to normal. (If a subsequent OpCode is never found, then this alternate |
| parsing mode handles all the remaining tokens in the program.) |
| |
| The assembler processes the tokens encountered in alternate parsing mode as |
| follows: |
| |
| * If the token is a number literal, since context may be lost, the number |
| is interpreted as a 32-bit value and output as a single word. In order to |
| specify multiple-word literals in alternate-parsing mode, further uses of |
| `!<integer>` tokens may be required. |
| All formats supported by `strtoul()` are accepted. |
| * If the token is a string literal, it outputs a sequence of words representing |
| the string as defined in the SPIR-V specification for Literal String. |
| * If the token is an ID, it outputs the ID's internal number. |
| * If the token is another `!<integer>`, it outputs that integer. |
| * Any other token causes the assembler to quit with an error. |
| |
| Note that this has some interesting consequences, including: |
| |
| * When an OpCode is replaced by `!<integer>`, the integer value should encode |
| the instruction's word count, as specified in the physical-layout section of |
| the SPIR-V specification. |
| |
| * Consecutive instructions may have their OpCode replaced by `!<integer>` and |
| still produce valid SPIR-V. For example, `!262187 %1 %2 "abc" !327739 %1 %3 6 |
| %2` will successfully assemble into SPIR-V declaring a constant and a |
| PrivateGlobal variable. |
| |
| * Enums (such as `DontInline` or `SubgroupMemory`, for instance) are not handled |
| by the alternate parsing mode. They must be replaced by `!<integer>` for |
| successful assembly. |
| |
| * The `<result-id>` on the left-hand side of an assignment cannot be a |
| `!<integer>`. The `<result-id>` can be still be manually controlled if desired |
| by expressing the entire instruction as `!<integer>` tokens for its opcode and |
| operands. |
| |
| * The `=` sign cannot be processed by the alternate parsing mode if the OpCode |
| following it is a `!<integer>`. |
| |
| * When replacing a named ID with `!<integer>`, it is possible to generate |
| unintentionally valid SPIR-V. If the integer provided happens to equal a |
| number generated for an existing named ID, it will result in a reference to |
| that named ID being output. This may be valid SPIR-V, contrary to the |
| presumed intention of the writer. |
| |
| ## Notes |
| |
| * Some enumerants cannot be used by name, because the target instruction |
| in which they are meaningful take an ID reference instead of a literal value. |
| For example: |
| * Named enumerated value `CmdExecTime` from section 3.30 Kernel |
| Profiling Info is used in constructing a mask value supplied as |
| an ID for `OpCaptureEventProfilingInfo`. But no other instruction |
| has enough context to bring the enumerant names from section 3.30 |
| into scope. |
| * Similarly, the names in section 3.29 Kernel Enqueue Flags are used to |
| construct a value supplied as an ID to the Flags argument of |
| OpEnqueueKernel. |
| * Similarly for the names in section 3.25 Memory Semantics. |
| * Similarly for the names in section 3.27 Scope. |
| * Some enumerants cannot be used by name, because they only name values |
| returned by an instruction: |
| * Enumerants from 3.12 Image Channel Order name possible values returned |
| by the `OpImageQueryOrder` instruction. |
| * Enumerants from 3.13 Image Channel Data Type name possible values |
| returned by the `OpImageQueryFormat` instruction. |