mlir/docs/Tutorials/Toy/Ch-4.md - toolchain/llvm-project - Git at Google

 # Chapter 4: Enabling Generic Transformation with Interfaces

 [TOC]

 ## Background: Grappling with an Extensible IR

 Through dialects, MLIR allows for the representation of many different levels of
 abstraction; the Toy dialect that we have previously defined is one such
 example. Though these different dialects may represent different abstractions,
 there is often a set of common transformations and analyses that we would like
 to perform. The problem that arises is that naively implementing each
 transformation for each dialect leads to large amounts of code duplication, as
 the internal algorithms are generally very similar, if not the same. We would
 like to provide the ability for transformations to opaquely hook into dialects
 like Toy to get the information they need.

 MLIR provides a set of always available-hooks for certain core transformations,
 as seen in the [previous chapter](Ch-3.md), where we registered some
 canonicalizations via a hook on our operations (`getCanonicalizationPatterns`).
 However, these types of hooks don't really scale well. Therefore, a more generic
 solution was designed, in the form of [interfaces](../../Interfaces.md), to make
 the MLIR infrastructure as extensible as the representation. Interfaces provide
 a generic mechanism for dialects and operations to provide information to a
 transformation or analysis.

 ## Shape Inference: Preparing for Code Generation

 Our Toy IR currently operates on generic tensors, meaning that we don't know the
 shape of tensors other than during the initialization of constants. This
 complicates optimizations, as well as code generation. Fortunately, we can
 simply propagate the shapes through the computation until they are all known.
 The issue is how to handle calls to user-defined generic functions: every call
 site could deduce different shapes. One possibility would be to perform symbolic
 inference based on the argument types, but this would be hard to generalize if
 we were to introduce more control flow in the language. Another approach would
 be function specialization, where every call site with new argument shapes
 duplicates the called function and specializes it. The approach we take for Toy
 is to inline all of the function calls, then perform intraprocedural shape
 propagation.

 ### Inlining

 Here we could write an inlining algorithm specifically designed for the Toy
 dialect, but that can become quite complicated depending on the level of
 complexity that we want. Disregarding cost modeling, the pure structural
 transformation is already complex to implement from scratch. Thankfully, MLIR
 provides a generic inliner algorithm that dialects can plug into. All we need to
 do in Toy is to provide the [interfaces](../../Interfaces.md) for the inliner to
 hook into.

 The first thing we need to do is to define the constraints on inlining
 operations in the Toy dialect. This information is provided through a
 [dialect interface](../../Interfaces.md#dialect-interfaces). This is essentially
 a class containing a set of virtual hooks which the dialect can override.
 In this case, the interface is `DialectInlinerInterface`.

 ```c++
 /// This class defines the interface for handling inlining with Toy operations.
 /// We simplify inherit from the base interface class and override
 /// the necessary methods.
 struct ToyInlinerInterface : public DialectInlinerInterface {
   using DialectInlinerInterface::DialectInlinerInterface;

   /// This hook checks to see if the given callable operation is legal to inline
   /// into the given call. For Toy this hook can simply return true, as the Toy
   /// Call operation is always inlinable.
   bool isLegalToInline(Operation *call, Operation *callable,
                        bool wouldBeCloned) const final {
     return true;
   }

   /// This hook checks to see if the given operation is legal to inline into the
   /// given region. For Toy this hook can simply return true, as all Toy
   /// operations are inlinable.
   bool isLegalToInline(Operation *, Region *, bool,
                        BlockAndValueMapping &) const final {
     return true;
   }

   /// This hook is called when a terminator operation has been inlined. The only
   /// terminator that we have in the Toy dialect is the return
   /// operation(toy.return). We handle the return by replacing the values
   /// previously returned by the call operation with the operands of the
   /// return.
   void handleTerminator(Operation *op,
                         ArrayRef<Value> valuesToRepl) const final {
     // Only "toy.return" needs to be handled here.
     auto returnOp = cast<ReturnOp>(op);

     // Replace the values directly with the return operands.
     assert(returnOp.getNumOperands() == valuesToRepl.size());
     for (const auto &it : llvm::enumerate(returnOp.getOperands()))
       valuesToRepl[it.index()].replaceAllUsesWith(it.value());
   }
 };
 ```

 We then register our dialect interface directly on the Toy dialect, similarly to
 how we did for operations.

 ```c++
 ToyDialect::ToyDialect(mlir::MLIRContext *ctx) : mlir::Dialect("toy", ctx) {
   addInterfaces<ToyInlinerInterface>();
 }
 ```

 Next, we need to provide a way for the inliner to know that `toy.generic_call`
 represents a call to a function. MLIR provides an
 [operation interface](../../Interfaces.md#operation-interfaces) that can be used
 to mark an operation as being "call-like". Unlike dialect interfaces, operation
 interfaces provide a more refined granularity of information that is specific
 and core to a single operation. The interface that we will be adding here is the
 `CallOpInterface`.

 To add this interface we just need to include the definition into our operation
 specification file (`Ops.td`):

 ```tablegen
 include "mlir/Interfaces/CallInterfaces.td"
 ```

 and add it to the traits list of `GenericCallOp`:

 ```tablegen
 def GenericCallOp : Toy_Op<"generic_call",
     [DeclareOpInterfaceMethods<CallOpInterface>]> {
   ...
 }
 ```

 In the above we also use the `DeclareOpInterfaceMethods` directive to
 auto-declare all of the interface methods in the class declaration of
 GenericCallOp. This means that we just need to provide a definition:

 ```c++
 /// Return the callee of the generic call operation, this is required by the
 /// call interface.
 CallInterfaceCallable GenericCallOp::getCallableForCallee() {
   return getAttrOfType<SymbolRefAttr>("callee");
 }

 /// Get the argument operands to the called function, this is required by the
 /// call interface.
 Operation::operand_range GenericCallOp::getArgOperands() { return inputs(); }
 ```

 Now that the inliner has been informed about the Toy dialect, we can add the
 inliner pass to the pass manager for Toy:

 ```c++
   pm.addPass(mlir::createInlinerPass());
 ```

 Now let's look at a working example:

 ```mlir
 func @multiply_transpose(%arg0: tensor<*xf64>, %arg1: tensor<*xf64>) -> tensor<*xf64> {
   %0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64>
   %1 = toy.transpose(%arg1 : tensor<*xf64>) to tensor<*xf64>
   %2 = toy.mul %0, %1 : tensor<*xf64>
   toy.return %2 : tensor<*xf64>
 }
 func @main() {
   %0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>
   %1 = toy.reshape(%0 : tensor<2x3xf64>) to tensor<2x3xf64>
   %2 = toy.constant dense<[1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00, 5.000000e+00, 6.000000e+00]> : tensor<6xf64>
   %3 = toy.reshape(%2 : tensor<6xf64>) to tensor<2x3xf64>
   %4 = toy.generic_call @multiply_transpose(%1, %3) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64>
   %5 = toy.generic_call @multiply_transpose(%3, %1) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64>
   toy.print %5 : tensor<*xf64>
   toy.return
 }
 ```

 We have two calls to multiple_transpose that we would like to inline into main,
 but if we look at the output nothing has changed. We are missing one last subtle
 piece: there is a hidden type conversion on the edge of the call. If we look at
 the above, the operands to the generic_call are of type `tensor<2x3xf64>`, while
 the inputs to the function expect `tensor<*xf64>`. To resolve this difference,
 the inliner expects an explicit cast operation to be inserted. For this, we need
 to add a new operation to the Toy dialect, `ToyCastOp`(toy.cast), to represent
 casts between two different shapes.

 ```tablegen
 def CastOp : Toy_Op<"cast", [NoSideEffect, SameOperandsAndResultShape]> {
   let summary = "shape cast operation";
   let description = [{
     The "cast" operation converts a tensor from one type to an equivalent type
     without changing any data elements. The source and destination types
     must both be tensor types with the same element type. If both are ranked
     then the rank should be the same and static dimensions should match. The
     operation is invalid if converting to a mismatching constant dimension.
   }];

   let arguments = (ins F64Tensor:$input);
   let results = (outs F64Tensor:$output);

   // Set the folder bit so that we can fold redundant cast operations.
   let hasFolder = 1;
 }
 ```

 We can then override the necessary hook on the ToyInlinerInterface to insert
 this for us when necessary:

 ```c++
 struct ToyInlinerInterface : public DialectInlinerInterface {
   ...

   /// Attempts to materialize a conversion for a type mismatch between a call
   /// from this dialect, and a callable region. This method should generate an
   /// operation that takes 'input' as the only operand, and produces a single
   /// result of 'resultType'. If a conversion can not be generated, nullptr
   /// should be returned.
   Operation *materializeCallConversion(OpBuilder &builder, Value input,
                                        Type resultType,
                                        Location conversionLoc) const final {
     return builder.create<CastOp>(conversionLoc, resultType, input);
   }
 };
 ```

 If we run the working example through the pipeline again, we get the expected:

 ```mlir
 func @main() {
   %0 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64>
   %1 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64>
   %2 = "toy.cast"(%1) : (tensor<2x3xf64>) -> tensor<*xf64>
   %3 = "toy.cast"(%0) : (tensor<2x3xf64>) -> tensor<*xf64>
   %4 = "toy.transpose"(%2) : (tensor<*xf64>) -> tensor<*xf64>
   %5 = "toy.transpose"(%3) : (tensor<*xf64>) -> tensor<*xf64>
   %6 = "toy.mul"(%4, %5) : (tensor<*xf64>, tensor<*xf64>) -> tensor<*xf64>
   toy.print %6 : tensor<*xf64>
   toy.return
 }
 ```

 NOTE: The generic inliner will also perform simplifications, so the output may
 be a bit cleaner than expected.

 ### Intraprocedural Shape Inference

 Now that we have inlined all of the functions, we are left with a main function
 containing a mix of static and dynamically shaped operations. We can now write a
 simple shape inference pass to propagate shapes intraprocedurally (within a
 single function). We could write this as a pass that directly encodes the
 constraints of the operations within the Toy dialect, but this seems like a good
 candidate for a transformation that could be written generically. As a good rule
 of thumb, it is best to express a transformation as generically as possible,
 such that it can be extended to other dialects in the future. There is no
 telling how many other dialects may have similar needs or encounter the same
 problems.

 For shape inference, if we break down the problem to its core, we really just
 want operations to tell us the expected outputs given a set of statically known
 inputs. (We can definitely get more complex than that, but for our needs we can
 keep it simple.) Given that this property is core to a specific operation, we
 can define an operation interface that can be specified on operations that need
 to have their result shapes inferred.

 Similarly to operations, we can also
 [define operation interfaces](../../OpDefinitions.md#operation-interfaces) using
 the operation definition specification (ODS) framework.

 The interface is defined by inheriting from `OpInterface`, which takes the name
 to be given to the generated C++ interface class as a template argument. For our
 purposes, we will simply name the generated class `ShapeInference`. We also
 provide a description for the interface.

 ```tablegen
 def ShapeInferenceOpInterface : OpInterface<"ShapeInference"> {
   let description = [{
     Interface to access a registered method to infer the return types for an
     operation that can be used during type inference.
   }];
 }
 ```

 Next, we define the interface methods that the operations will need to provide.
 An interface method is comprised of: a description; a C++ return type in string
 form; a method name in string form; and a few optional components, depending on
 the need. See the
 [ODS documentation](../../OpDefinitions.md#operation-interfaces) for more
 information.

 ```tablegen
 def ShapeInferenceOpInterface : OpInterface<"ShapeInference"> {
   ...

   let methods = [
     InterfaceMethod<"Infer and set the output shape for the current operation.",
                     "void", "inferShapes">
   ];
 }
 ```

 Now that the interface is defined, we can add it to the necessary Toy operations
 in a similar way to how we added the `CallOpInterface` to the GenericCallOp:

 ```tablegen
 def MulOp : Toy_Op<"mul",
     [..., DeclareOpInterfaceMethods<ShapeInferenceOpInterface>]> {
   ...
 }
 ```

 Each of these operations will then need to provide a definition for the
 `inferShapes()` method. As an example, for the mul op, the result shape is
 inferred as the shape of the inputs.

 ```c++
 /// Infer the output shape of the MulOp, this is required by the shape inference
 /// interface.
 void MulOp::inferShapes() { getResult().setType(getOperand(0).getType()); }
 ```

 At this point, each of the necessary Toy operations provide a mechanism by which
 to infer their output shapes. The ShapeInferencePass is a FunctionPass: it will
 run on each Function in isolation. MLIR also supports general
 [OperationPasses](../../PassManagement.md#operation-pass) that run on any isolated
 operation (i.e. other function-like operations), but here our module only
 contains functions, so there is no need to generalize to all operations.

 Implementing such a pass is done by creating a class inheriting from
 `mlir::FunctionPass` and overriding the `runOnFunction()` method.

 ```c++
 class ShapeInferencePass
     : public mlir::PassWrapper<ShapeInferencePass, FunctionPass> {
   void runOnFunction() override {
     FuncOp function = getFunction();
     ...
   }
 };
 ```

 While at it, let's also create a helper method for instantiating the pass:

 ```c++
 std::unique_ptr<mlir::Pass> mlir::toy::createShapeInferencePass() {
   return std::make_unique<ShapeInferencePass>();
 }
 ```

 The shape inference algorithm operates as follows:

 1.  Build a worklist containing all the operations that return a dynamically
     shaped tensor: these are the operations that need shape inference.
 2.  Iterate on the worklist:
     -   find an operation to process: the next ready operation in the worklist
         has all of its arguments non-generic,
     -   if no operation is found, break out of the loop,
     -   remove the operation from the worklist,
     -   infer the shape of its output from the argument types.
 3.  If the worklist is empty, the algorithm succeeded.

 When processing an operation like described, we query if it registered the
 `ShapeInference` interface, using this code snippet:

 ```c++
   // Ask the operation to infer its output shapes.
   LLVM_DEBUG(llvm::dbgs() << "Inferring shape for: " << *op << "\n");

   /// We check if an operation has a particular interface by casting.
   if (ShapeInference shapeOp = dyn_cast<ShapeInference>(op)) {
     shapeOp.inferShapes();
   } else {
     op->emitError("unable to infer shape of operation without shape "
                   "inference interface");
     return signalPassFailure();
   }
 ```

 We can then add our pass to the pass manager:

 ```c++
   pm.addPass(mlir::createShapeInferencePass());
 ```

 If we rerun our original example, we now get the following:

 ```mlir
 func @main() {
   %0 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64>
   %1 = "toy.transpose"(%0) : (tensor<2x3xf64>) -> tensor<3x2xf64>
   %2 = "toy.mul"(%1, %1) : (tensor<3x2xf64>, tensor<3x2xf64>) -> tensor<3x2xf64>
   toy.print %2 : tensor<3x2xf64>
   toy.return
 }
 ```

 You can build `toyc-ch4` and try yourself: `toyc-ch4
 test/Examples/Toy/Ch4/codegen.toy -emit=mlir -opt`.

 In the [next chapter](Ch-5.md), we will start the process of code generation by
 targeting a lower level dialect for optimizing some of the more compute-heavy
 Toy operations.
	# Chapter 4: Enabling Generic Transformation with Interfaces

	[TOC]

	## Background: Grappling with an Extensible IR

	Through dialects, MLIR allows for the representation of many different levels of
	abstraction; the Toy dialect that we have previously defined is one such
	example. Though these different dialects may represent different abstractions,
	there is often a set of common transformations and analyses that we would like
	to perform. The problem that arises is that naively implementing each
	transformation for each dialect leads to large amounts of code duplication, as
	the internal algorithms are generally very similar, if not the same. We would
	like to provide the ability for transformations to opaquely hook into dialects
	like Toy to get the information they need.

	MLIR provides a set of always available-hooks for certain core transformations,
	as seen in the [previous chapter](Ch-3.md), where we registered some
	canonicalizations via a hook on our operations (`getCanonicalizationPatterns`).
	However, these types of hooks don't really scale well. Therefore, a more generic
	solution was designed, in the form of [interfaces](../../Interfaces.md), to make
	the MLIR infrastructure as extensible as the representation. Interfaces provide
	a generic mechanism for dialects and operations to provide information to a
	transformation or analysis.

	## Shape Inference: Preparing for Code Generation

	Our Toy IR currently operates on generic tensors, meaning that we don't know the
	shape of tensors other than during the initialization of constants. This
	complicates optimizations, as well as code generation. Fortunately, we can
	simply propagate the shapes through the computation until they are all known.
	The issue is how to handle calls to user-defined generic functions: every call
	site could deduce different shapes. One possibility would be to perform symbolic
	inference based on the argument types, but this would be hard to generalize if
	we were to introduce more control flow in the language. Another approach would
	be function specialization, where every call site with new argument shapes
	duplicates the called function and specializes it. The approach we take for Toy
	is to inline all of the function calls, then perform intraprocedural shape
	propagation.

	### Inlining

	Here we could write an inlining algorithm specifically designed for the Toy
	dialect, but that can become quite complicated depending on the level of
	complexity that we want. Disregarding cost modeling, the pure structural
	transformation is already complex to implement from scratch. Thankfully, MLIR
	provides a generic inliner algorithm that dialects can plug into. All we need to
	do in Toy is to provide the [interfaces](../../Interfaces.md) for the inliner to
	hook into.

	The first thing we need to do is to define the constraints on inlining
	operations in the Toy dialect. This information is provided through a
	[dialect interface](../../Interfaces.md#dialect-interfaces). This is essentially
	a class containing a set of virtual hooks which the dialect can override.
	In this case, the interface is `DialectInlinerInterface`.

	```c++
	/// This class defines the interface for handling inlining with Toy operations.
	/// We simplify inherit from the base interface class and override
	/// the necessary methods.
	struct ToyInlinerInterface : public DialectInlinerInterface {
	using DialectInlinerInterface::DialectInlinerInterface;

	/// This hook checks to see if the given callable operation is legal to inline
	/// into the given call. For Toy this hook can simply return true, as the Toy
	/// Call operation is always inlinable.
	bool isLegalToInline(Operation call, Operation callable,
	bool wouldBeCloned) const final {
	return true;
	}

	/// This hook checks to see if the given operation is legal to inline into the
	/// given region. For Toy this hook can simply return true, as all Toy
	/// operations are inlinable.
	bool isLegalToInline(Operation , Region , bool,
	BlockAndValueMapping &) const final {
	return true;
	}

	/// This hook is called when a terminator operation has been inlined. The only
	/// terminator that we have in the Toy dialect is the return
	/// operation(toy.return). We handle the return by replacing the values
	/// previously returned by the call operation with the operands of the
	/// return.
	void handleTerminator(Operation *op,
	ArrayRef<Value> valuesToRepl) const final {
	// Only "toy.return" needs to be handled here.
	auto returnOp = cast<ReturnOp>(op);

	// Replace the values directly with the return operands.
	assert(returnOp.getNumOperands() == valuesToRepl.size());
	for (const auto &it : llvm::enumerate(returnOp.getOperands()))
	valuesToRepl[it.index()].replaceAllUsesWith(it.value());
	}
	};
	```

	We then register our dialect interface directly on the Toy dialect, similarly to
	how we did for operations.

	```c++
	ToyDialect::ToyDialect(mlir::MLIRContext *ctx) : mlir::Dialect("toy", ctx) {
	addInterfaces<ToyInlinerInterface>();
	}
	```

	Next, we need to provide a way for the inliner to know that `toy.generic_call`
	represents a call to a function. MLIR provides an
	[operation interface](../../Interfaces.md#operation-interfaces) that can be used
	to mark an operation as being "call-like". Unlike dialect interfaces, operation
	interfaces provide a more refined granularity of information that is specific
	and core to a single operation. The interface that we will be adding here is the
	`CallOpInterface`.

	To add this interface we just need to include the definition into our operation
	specification file (`Ops.td`):

	```tablegen
	include "mlir/Interfaces/CallInterfaces.td"
	```

	and add it to the traits list of `GenericCallOp`:

	```tablegen
	def GenericCallOp : Toy_Op<"generic_call",
	[DeclareOpInterfaceMethods<CallOpInterface>]> {
	...
	}
	```

	In the above we also use the `DeclareOpInterfaceMethods` directive to
	auto-declare all of the interface methods in the class declaration of
	GenericCallOp. This means that we just need to provide a definition:

	```c++
	/// Return the callee of the generic call operation, this is required by the
	/// call interface.
	CallInterfaceCallable GenericCallOp::getCallableForCallee() {
	return getAttrOfType<SymbolRefAttr>("callee");
	}

	/// Get the argument operands to the called function, this is required by the
	/// call interface.
	Operation::operand_range GenericCallOp::getArgOperands() { return inputs(); }
	```

	Now that the inliner has been informed about the Toy dialect, we can add the
	inliner pass to the pass manager for Toy:

	```c++
	pm.addPass(mlir::createInlinerPass());
	```

	Now let's look at a working example:

	```mlir
	func @multiply_transpose(%arg0: tensor<xf64>, %arg1: tensor<xf64>) -> tensor<*xf64> {
	%0 = toy.transpose(%arg0 : tensor<xf64>) to tensor<xf64>
	%1 = toy.transpose(%arg1 : tensor<xf64>) to tensor<xf64>
	%2 = toy.mul %0, %1 : tensor<*xf64>
	toy.return %2 : tensor<*xf64>
	}
	func @main() {
	%0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>
	%1 = toy.reshape(%0 : tensor<2x3xf64>) to tensor<2x3xf64>
	%2 = toy.constant dense<[1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00, 5.000000e+00, 6.000000e+00]> : tensor<6xf64>
	%3 = toy.reshape(%2 : tensor<6xf64>) to tensor<2x3xf64>
	%4 = toy.generic_call @multiply_transpose(%1, %3) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64>
	%5 = toy.generic_call @multiply_transpose(%3, %1) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64>
	toy.print %5 : tensor<*xf64>
	toy.return
	}
	```

	We have two calls to multiple_transpose that we would like to inline into main,
	but if we look at the output nothing has changed. We are missing one last subtle
	piece: there is a hidden type conversion on the edge of the call. If we look at
	the above, the operands to the generic_call are of type `tensor<2x3xf64>`, while
	the inputs to the function expect `tensor<*xf64>`. To resolve this difference,
	the inliner expects an explicit cast operation to be inserted. For this, we need
	to add a new operation to the Toy dialect, `ToyCastOp`(toy.cast), to represent
	casts between two different shapes.

	```tablegen
	def CastOp : Toy_Op<"cast", [NoSideEffect, SameOperandsAndResultShape]> {
	let summary = "shape cast operation";
	let description = [{
	The "cast" operation converts a tensor from one type to an equivalent type
	without changing any data elements. The source and destination types
	must both be tensor types with the same element type. If both are ranked
	then the rank should be the same and static dimensions should match. The
	operation is invalid if converting to a mismatching constant dimension.
	}];

	let arguments = (ins F64Tensor:$input);
	let results = (outs F64Tensor:$output);

	// Set the folder bit so that we can fold redundant cast operations.
	let hasFolder = 1;
	}
	```

	We can then override the necessary hook on the ToyInlinerInterface to insert
	this for us when necessary:

	```c++
	struct ToyInlinerInterface : public DialectInlinerInterface {
	...

	/// Attempts to materialize a conversion for a type mismatch between a call
	/// from this dialect, and a callable region. This method should generate an
	/// operation that takes 'input' as the only operand, and produces a single
	/// result of 'resultType'. If a conversion can not be generated, nullptr
	/// should be returned.
	Operation *materializeCallConversion(OpBuilder &builder, Value input,
	Type resultType,
	Location conversionLoc) const final {
	return builder.create<CastOp>(conversionLoc, resultType, input);
	}
	};
	```

	If we run the working example through the pipeline again, we get the expected:

	```mlir
	func @main() {
	%0 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64>
	%1 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64>
	%2 = "toy.cast"(%1) : (tensor<2x3xf64>) -> tensor<*xf64>
	%3 = "toy.cast"(%0) : (tensor<2x3xf64>) -> tensor<*xf64>
	%4 = "toy.transpose"(%2) : (tensor<xf64>) -> tensor<xf64>
	%5 = "toy.transpose"(%3) : (tensor<xf64>) -> tensor<xf64>
	%6 = "toy.mul"(%4, %5) : (tensor<xf64>, tensor<xf64>) -> tensor<*xf64>
	toy.print %6 : tensor<*xf64>
	toy.return
	}
	```

	NOTE: The generic inliner will also perform simplifications, so the output may
	be a bit cleaner than expected.

	### Intraprocedural Shape Inference

	Now that we have inlined all of the functions, we are left with a main function
	containing a mix of static and dynamically shaped operations. We can now write a
	simple shape inference pass to propagate shapes intraprocedurally (within a
	single function). We could write this as a pass that directly encodes the
	constraints of the operations within the Toy dialect, but this seems like a good
	candidate for a transformation that could be written generically. As a good rule
	of thumb, it is best to express a transformation as generically as possible,
	such that it can be extended to other dialects in the future. There is no
	telling how many other dialects may have similar needs or encounter the same
	problems.

	For shape inference, if we break down the problem to its core, we really just
	want operations to tell us the expected outputs given a set of statically known
	inputs. (We can definitely get more complex than that, but for our needs we can
	keep it simple.) Given that this property is core to a specific operation, we
	can define an operation interface that can be specified on operations that need
	to have their result shapes inferred.

	Similarly to operations, we can also
	[define operation interfaces](../../OpDefinitions.md#operation-interfaces) using
	the operation definition specification (ODS) framework.

	The interface is defined by inheriting from `OpInterface`, which takes the name
	to be given to the generated C++ interface class as a template argument. For our
	purposes, we will simply name the generated class `ShapeInference`. We also
	provide a description for the interface.

	```tablegen
	def ShapeInferenceOpInterface : OpInterface<"ShapeInference"> {
	let description = [{
	Interface to access a registered method to infer the return types for an
	operation that can be used during type inference.
	}];
	}
	```

	Next, we define the interface methods that the operations will need to provide.
	An interface method is comprised of: a description; a C++ return type in string
	form; a method name in string form; and a few optional components, depending on
	the need. See the
	[ODS documentation](../../OpDefinitions.md#operation-interfaces) for more
	information.

	```tablegen
	def ShapeInferenceOpInterface : OpInterface<"ShapeInference"> {
	...

	let methods = [
	InterfaceMethod<"Infer and set the output shape for the current operation.",
	"void", "inferShapes">
	];
	}
	```

	Now that the interface is defined, we can add it to the necessary Toy operations
	in a similar way to how we added the `CallOpInterface` to the GenericCallOp:

	```tablegen
	def MulOp : Toy_Op<"mul",
	[..., DeclareOpInterfaceMethods<ShapeInferenceOpInterface>]> {
	...
	}
	```

	Each of these operations will then need to provide a definition for the
	`inferShapes()` method. As an example, for the mul op, the result shape is
	inferred as the shape of the inputs.

	```c++
	/// Infer the output shape of the MulOp, this is required by the shape inference
	/// interface.
	void MulOp::inferShapes() { getResult().setType(getOperand(0).getType()); }
	```

	At this point, each of the necessary Toy operations provide a mechanism by which
	to infer their output shapes. The ShapeInferencePass is a FunctionPass: it will
	run on each Function in isolation. MLIR also supports general
	[OperationPasses](../../PassManagement.md#operation-pass) that run on any isolated
	operation (i.e. other function-like operations), but here our module only
	contains functions, so there is no need to generalize to all operations.

	Implementing such a pass is done by creating a class inheriting from
	`mlir::FunctionPass` and overriding the `runOnFunction()` method.

	```c++
	class ShapeInferencePass
	: public mlir::PassWrapper<ShapeInferencePass, FunctionPass> {
	void runOnFunction() override {
	FuncOp function = getFunction();
	...
	}
	};
	```

	While at it, let's also create a helper method for instantiating the pass:

	```c++
	std::unique_ptr<mlir::Pass> mlir::toy::createShapeInferencePass() {
	return std::make_unique<ShapeInferencePass>();
	}
	```

	The shape inference algorithm operates as follows:

	1. Build a worklist containing all the operations that return a dynamically
	shaped tensor: these are the operations that need shape inference.
	2. Iterate on the worklist:
	- find an operation to process: the next ready operation in the worklist
	has all of its arguments non-generic,
	- if no operation is found, break out of the loop,
	- remove the operation from the worklist,
	- infer the shape of its output from the argument types.
	3. If the worklist is empty, the algorithm succeeded.

	When processing an operation like described, we query if it registered the
	`ShapeInference` interface, using this code snippet:

	```c++
	// Ask the operation to infer its output shapes.
	LLVM_DEBUG(llvm::dbgs() << "Inferring shape for: " << *op << "\n");

	/// We check if an operation has a particular interface by casting.
	if (ShapeInference shapeOp = dyn_cast<ShapeInference>(op)) {
	shapeOp.inferShapes();
	} else {
	op->emitError("unable to infer shape of operation without shape "
	"inference interface");
	return signalPassFailure();
	}
	```

	We can then add our pass to the pass manager:

	```c++
	pm.addPass(mlir::createShapeInferencePass());
	```

	If we rerun our original example, we now get the following:

	```mlir
	func @main() {
	%0 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64>
	%1 = "toy.transpose"(%0) : (tensor<2x3xf64>) -> tensor<3x2xf64>
	%2 = "toy.mul"(%1, %1) : (tensor<3x2xf64>, tensor<3x2xf64>) -> tensor<3x2xf64>
	toy.print %2 : tensor<3x2xf64>
	toy.return
	}
	```

	You can build `toyc-ch4` and try yourself: `toyc-ch4
	test/Examples/Toy/Ch4/codegen.toy -emit=mlir -opt`.

	In the [next chapter](Ch-5.md), we will start the process of code generation by
	targeting a lower level dialect for optimizing some of the more compute-heavy
	Toy operations.