| # Chapter 4: Enabling Generic Transformation with Interfaces |
| |
| [TOC] |
| |
| ## Background: Grappling with an Extensible IR |
| |
| Through dialects, MLIR allows for the representation of many different levels of |
| abstraction; the Toy dialect that we have previously defined is one such |
| example. Though these different dialects may represent different abstractions, |
| there is often a set of common transformations and analyses that we would like |
| to perform. The problem that arises is that naively implementing each |
| transformation for each dialect leads to large amounts of code duplication, as |
| the internal algorithms are generally very similar, if not the same. We would |
| like to provide the ability for transformations to opaquely hook into dialects |
| like Toy to get the information they need. |
| |
| MLIR provides a set of always available-hooks for certain core transformations, |
| as seen in the [previous chapter](Ch-3.md), where we registered some |
| canonicalizations via a hook on our operations (`getCanonicalizationPatterns`). |
| However, these types of hooks don't really scale well. Therefore, a more generic |
| solution was designed, in the form of [interfaces](../../Interfaces.md), to make |
| the MLIR infrastructure as extensible as the representation. Interfaces provide |
| a generic mechanism for dialects and operations to provide information to a |
| transformation or analysis. |
| |
| ## Shape Inference: Preparing for Code Generation |
| |
| Our Toy IR currently operates on generic tensors, meaning that we don't know the |
| shape of tensors other than during the initialization of constants. This |
| complicates optimizations, as well as code generation. Fortunately, we can |
| simply propagate the shapes through the computation until they are all known. |
| The issue is how to handle calls to user-defined generic functions: every call |
| site could deduce different shapes. One possibility would be to perform symbolic |
| inference based on the argument types, but this would be hard to generalize if |
| we were to introduce more control flow in the language. Another approach would |
| be function specialization, where every call site with new argument shapes |
| duplicates the called function and specializes it. The approach we take for Toy |
| is to inline all of the function calls, then perform intraprocedural shape |
| propagation. |
| |
| ### Inlining |
| |
| Here we could write an inlining algorithm specifically designed for the Toy |
| dialect, but that can become quite complicated depending on the level of |
| complexity that we want. Disregarding cost modeling, the pure structural |
| transformation is already complex to implement from scratch. Thankfully, MLIR |
| provides a generic inliner algorithm that dialects can plug into. All we need to |
| do in Toy is to provide the [interfaces](../../Interfaces.md) for the inliner to |
| hook into. |
| |
| The first thing we need to do is to define the constraints on inlining |
| operations in the Toy dialect. This information is provided through a |
| [dialect interface](../../Interfaces.md#dialect-interfaces). This is essentially |
| a class containing a set of virtual hooks which the dialect can override. |
| In this case, the interface is `DialectInlinerInterface`. |
| |
| ```c++ |
| /// This class defines the interface for handling inlining with Toy operations. |
| /// We simplify inherit from the base interface class and override |
| /// the necessary methods. |
| struct ToyInlinerInterface : public DialectInlinerInterface { |
| using DialectInlinerInterface::DialectInlinerInterface; |
| |
| /// This hook checks to see if the given callable operation is legal to inline |
| /// into the given call. For Toy this hook can simply return true, as the Toy |
| /// Call operation is always inlinable. |
| bool isLegalToInline(Operation *call, Operation *callable, |
| bool wouldBeCloned) const final { |
| return true; |
| } |
| |
| /// This hook checks to see if the given operation is legal to inline into the |
| /// given region. For Toy this hook can simply return true, as all Toy |
| /// operations are inlinable. |
| bool isLegalToInline(Operation *, Region *, bool, |
| BlockAndValueMapping &) const final { |
| return true; |
| } |
| |
| /// This hook is called when a terminator operation has been inlined. The only |
| /// terminator that we have in the Toy dialect is the return |
| /// operation(toy.return). We handle the return by replacing the values |
| /// previously returned by the call operation with the operands of the |
| /// return. |
| void handleTerminator(Operation *op, |
| ArrayRef<Value> valuesToRepl) const final { |
| // Only "toy.return" needs to be handled here. |
| auto returnOp = cast<ReturnOp>(op); |
| |
| // Replace the values directly with the return operands. |
| assert(returnOp.getNumOperands() == valuesToRepl.size()); |
| for (const auto &it : llvm::enumerate(returnOp.getOperands())) |
| valuesToRepl[it.index()].replaceAllUsesWith(it.value()); |
| } |
| }; |
| ``` |
| |
| We then register our dialect interface directly on the Toy dialect, similarly to |
| how we did for operations. |
| |
| ```c++ |
| ToyDialect::ToyDialect(mlir::MLIRContext *ctx) : mlir::Dialect("toy", ctx) { |
| addInterfaces<ToyInlinerInterface>(); |
| } |
| ``` |
| |
| Next, we need to provide a way for the inliner to know that `toy.generic_call` |
| represents a call to a function. MLIR provides an |
| [operation interface](../../Interfaces.md#operation-interfaces) that can be used |
| to mark an operation as being "call-like". Unlike dialect interfaces, operation |
| interfaces provide a more refined granularity of information that is specific |
| and core to a single operation. The interface that we will be adding here is the |
| `CallOpInterface`. |
| |
| To add this interface we just need to include the definition into our operation |
| specification file (`Ops.td`): |
| |
| ```tablegen |
| include "mlir/Interfaces/CallInterfaces.td" |
| ``` |
| |
| and add it to the traits list of `GenericCallOp`: |
| |
| ```tablegen |
| def GenericCallOp : Toy_Op<"generic_call", |
| [DeclareOpInterfaceMethods<CallOpInterface>]> { |
| ... |
| } |
| ``` |
| |
| In the above we also use the `DeclareOpInterfaceMethods` directive to |
| auto-declare all of the interface methods in the class declaration of |
| GenericCallOp. This means that we just need to provide a definition: |
| |
| ```c++ |
| /// Return the callee of the generic call operation, this is required by the |
| /// call interface. |
| CallInterfaceCallable GenericCallOp::getCallableForCallee() { |
| return getAttrOfType<SymbolRefAttr>("callee"); |
| } |
| |
| /// Get the argument operands to the called function, this is required by the |
| /// call interface. |
| Operation::operand_range GenericCallOp::getArgOperands() { return inputs(); } |
| ``` |
| |
| Now that the inliner has been informed about the Toy dialect, we can add the |
| inliner pass to the pass manager for Toy: |
| |
| ```c++ |
| pm.addPass(mlir::createInlinerPass()); |
| ``` |
| |
| Now let's look at a working example: |
| |
| ```mlir |
| func @multiply_transpose(%arg0: tensor<*xf64>, %arg1: tensor<*xf64>) -> tensor<*xf64> { |
| %0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64> |
| %1 = toy.transpose(%arg1 : tensor<*xf64>) to tensor<*xf64> |
| %2 = toy.mul %0, %1 : tensor<*xf64> |
| toy.return %2 : tensor<*xf64> |
| } |
| func @main() { |
| %0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64> |
| %1 = toy.reshape(%0 : tensor<2x3xf64>) to tensor<2x3xf64> |
| %2 = toy.constant dense<[1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00, 5.000000e+00, 6.000000e+00]> : tensor<6xf64> |
| %3 = toy.reshape(%2 : tensor<6xf64>) to tensor<2x3xf64> |
| %4 = toy.generic_call @multiply_transpose(%1, %3) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64> |
| %5 = toy.generic_call @multiply_transpose(%3, %1) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64> |
| toy.print %5 : tensor<*xf64> |
| toy.return |
| } |
| ``` |
| |
| We have two calls to multiple_transpose that we would like to inline into main, |
| but if we look at the output nothing has changed. We are missing one last subtle |
| piece: there is a hidden type conversion on the edge of the call. If we look at |
| the above, the operands to the generic_call are of type `tensor<2x3xf64>`, while |
| the inputs to the function expect `tensor<*xf64>`. To resolve this difference, |
| the inliner expects an explicit cast operation to be inserted. For this, we need |
| to add a new operation to the Toy dialect, `ToyCastOp`(toy.cast), to represent |
| casts between two different shapes. |
| |
| ```tablegen |
| def CastOp : Toy_Op<"cast", [NoSideEffect, SameOperandsAndResultShape]> { |
| let summary = "shape cast operation"; |
| let description = [{ |
| The "cast" operation converts a tensor from one type to an equivalent type |
| without changing any data elements. The source and destination types |
| must both be tensor types with the same element type. If both are ranked |
| then the rank should be the same and static dimensions should match. The |
| operation is invalid if converting to a mismatching constant dimension. |
| }]; |
| |
| let arguments = (ins F64Tensor:$input); |
| let results = (outs F64Tensor:$output); |
| |
| // Set the folder bit so that we can fold redundant cast operations. |
| let hasFolder = 1; |
| } |
| ``` |
| |
| We can then override the necessary hook on the ToyInlinerInterface to insert |
| this for us when necessary: |
| |
| ```c++ |
| struct ToyInlinerInterface : public DialectInlinerInterface { |
| ... |
| |
| /// Attempts to materialize a conversion for a type mismatch between a call |
| /// from this dialect, and a callable region. This method should generate an |
| /// operation that takes 'input' as the only operand, and produces a single |
| /// result of 'resultType'. If a conversion can not be generated, nullptr |
| /// should be returned. |
| Operation *materializeCallConversion(OpBuilder &builder, Value input, |
| Type resultType, |
| Location conversionLoc) const final { |
| return builder.create<CastOp>(conversionLoc, resultType, input); |
| } |
| }; |
| ``` |
| |
| If we run the working example through the pipeline again, we get the expected: |
| |
| ```mlir |
| func @main() { |
| %0 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64> |
| %1 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64> |
| %2 = "toy.cast"(%1) : (tensor<2x3xf64>) -> tensor<*xf64> |
| %3 = "toy.cast"(%0) : (tensor<2x3xf64>) -> tensor<*xf64> |
| %4 = "toy.transpose"(%2) : (tensor<*xf64>) -> tensor<*xf64> |
| %5 = "toy.transpose"(%3) : (tensor<*xf64>) -> tensor<*xf64> |
| %6 = "toy.mul"(%4, %5) : (tensor<*xf64>, tensor<*xf64>) -> tensor<*xf64> |
| toy.print %6 : tensor<*xf64> |
| toy.return |
| } |
| ``` |
| |
| NOTE: The generic inliner will also perform simplifications, so the output may |
| be a bit cleaner than expected. |
| |
| ### Intraprocedural Shape Inference |
| |
| Now that we have inlined all of the functions, we are left with a main function |
| containing a mix of static and dynamically shaped operations. We can now write a |
| simple shape inference pass to propagate shapes intraprocedurally (within a |
| single function). We could write this as a pass that directly encodes the |
| constraints of the operations within the Toy dialect, but this seems like a good |
| candidate for a transformation that could be written generically. As a good rule |
| of thumb, it is best to express a transformation as generically as possible, |
| such that it can be extended to other dialects in the future. There is no |
| telling how many other dialects may have similar needs or encounter the same |
| problems. |
| |
| For shape inference, if we break down the problem to its core, we really just |
| want operations to tell us the expected outputs given a set of statically known |
| inputs. (We can definitely get more complex than that, but for our needs we can |
| keep it simple.) Given that this property is core to a specific operation, we |
| can define an operation interface that can be specified on operations that need |
| to have their result shapes inferred. |
| |
| Similarly to operations, we can also |
| [define operation interfaces](../../OpDefinitions.md#operation-interfaces) using |
| the operation definition specification (ODS) framework. |
| |
| The interface is defined by inheriting from `OpInterface`, which takes the name |
| to be given to the generated C++ interface class as a template argument. For our |
| purposes, we will simply name the generated class `ShapeInference`. We also |
| provide a description for the interface. |
| |
| ```tablegen |
| def ShapeInferenceOpInterface : OpInterface<"ShapeInference"> { |
| let description = [{ |
| Interface to access a registered method to infer the return types for an |
| operation that can be used during type inference. |
| }]; |
| } |
| ``` |
| |
| Next, we define the interface methods that the operations will need to provide. |
| An interface method is comprised of: a description; a C++ return type in string |
| form; a method name in string form; and a few optional components, depending on |
| the need. See the |
| [ODS documentation](../../OpDefinitions.md#operation-interfaces) for more |
| information. |
| |
| ```tablegen |
| def ShapeInferenceOpInterface : OpInterface<"ShapeInference"> { |
| ... |
| |
| let methods = [ |
| InterfaceMethod<"Infer and set the output shape for the current operation.", |
| "void", "inferShapes"> |
| ]; |
| } |
| ``` |
| |
| Now that the interface is defined, we can add it to the necessary Toy operations |
| in a similar way to how we added the `CallOpInterface` to the GenericCallOp: |
| |
| ```tablegen |
| def MulOp : Toy_Op<"mul", |
| [..., DeclareOpInterfaceMethods<ShapeInferenceOpInterface>]> { |
| ... |
| } |
| ``` |
| |
| Each of these operations will then need to provide a definition for the |
| `inferShapes()` method. As an example, for the mul op, the result shape is |
| inferred as the shape of the inputs. |
| |
| ```c++ |
| /// Infer the output shape of the MulOp, this is required by the shape inference |
| /// interface. |
| void MulOp::inferShapes() { getResult().setType(getOperand(0).getType()); } |
| ``` |
| |
| At this point, each of the necessary Toy operations provide a mechanism by which |
| to infer their output shapes. The ShapeInferencePass is a FunctionPass: it will |
| run on each Function in isolation. MLIR also supports general |
| [OperationPasses](../../PassManagement.md#operation-pass) that run on any isolated |
| operation (i.e. other function-like operations), but here our module only |
| contains functions, so there is no need to generalize to all operations. |
| |
| Implementing such a pass is done by creating a class inheriting from |
| `mlir::FunctionPass` and overriding the `runOnFunction()` method. |
| |
| ```c++ |
| class ShapeInferencePass |
| : public mlir::PassWrapper<ShapeInferencePass, FunctionPass> { |
| void runOnFunction() override { |
| FuncOp function = getFunction(); |
| ... |
| } |
| }; |
| ``` |
| |
| While at it, let's also create a helper method for instantiating the pass: |
| |
| ```c++ |
| std::unique_ptr<mlir::Pass> mlir::toy::createShapeInferencePass() { |
| return std::make_unique<ShapeInferencePass>(); |
| } |
| ``` |
| |
| The shape inference algorithm operates as follows: |
| |
| 1. Build a worklist containing all the operations that return a dynamically |
| shaped tensor: these are the operations that need shape inference. |
| 2. Iterate on the worklist: |
| - find an operation to process: the next ready operation in the worklist |
| has all of its arguments non-generic, |
| - if no operation is found, break out of the loop, |
| - remove the operation from the worklist, |
| - infer the shape of its output from the argument types. |
| 3. If the worklist is empty, the algorithm succeeded. |
| |
| When processing an operation like described, we query if it registered the |
| `ShapeInference` interface, using this code snippet: |
| |
| ```c++ |
| // Ask the operation to infer its output shapes. |
| LLVM_DEBUG(llvm::dbgs() << "Inferring shape for: " << *op << "\n"); |
| |
| /// We check if an operation has a particular interface by casting. |
| if (ShapeInference shapeOp = dyn_cast<ShapeInference>(op)) { |
| shapeOp.inferShapes(); |
| } else { |
| op->emitError("unable to infer shape of operation without shape " |
| "inference interface"); |
| return signalPassFailure(); |
| } |
| ``` |
| |
| We can then add our pass to the pass manager: |
| |
| ```c++ |
| pm.addPass(mlir::createShapeInferencePass()); |
| ``` |
| |
| If we rerun our original example, we now get the following: |
| |
| ```mlir |
| func @main() { |
| %0 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64> |
| %1 = "toy.transpose"(%0) : (tensor<2x3xf64>) -> tensor<3x2xf64> |
| %2 = "toy.mul"(%1, %1) : (tensor<3x2xf64>, tensor<3x2xf64>) -> tensor<3x2xf64> |
| toy.print %2 : tensor<3x2xf64> |
| toy.return |
| } |
| ``` |
| |
| You can build `toyc-ch4` and try yourself: `toyc-ch4 |
| test/Examples/Toy/Ch4/codegen.toy -emit=mlir -opt`. |
| |
| In the [next chapter](Ch-5.md), we will start the process of code generation by |
| targeting a lower level dialect for optimizing some of the more compute-heavy |
| Toy operations. |