doc/output.md - platform/external/gemmlowp - Git at Google

 # Output pipelines in gemmlowp

 In gemmlowp, the "output pipeline" is the process that takes a final `int32`
 accumulator value (the output of the compute/kernel stage), and processes it to
 obtain the final value (typically a `uint8` value) and write it to the
 destination matrix.

 Gemmlowp has some genericity in what arithmetic transformations take place in
 the output pipeline, so as to allow different users to implement different
 quantization paradigms. See [low-precision.md](low-precision.md) and
 [quantization.md](quantization.md).

 Besides implementing a quantization paradigm, the other thing that output
 pipelines is good for, is implementing fused operations where a matrix
 multiplication feeds into other operations applied to its result, without
 additional array traversals. For instance, when implementing neural network
 inference, one might have a Convolutional layer with a bias-addition and an
 activation. One then wants to feed the result of the matrix multiplication
 implementing the Convolutional operator itself, directly into the bias-addition
 and activation function. gemmlowp's output pipelines allow implementing that:
 the bias-addition and activation function are just additional stages in the
 output pipeline.

 ## Usage

 The gemmlowp entry point allowing to use an arbitrary output pipeline is
 `GemmWithOutputPipeline` in [public/gemmlowp.h](../public/gemmlowp.h).

 The output pipeline is specified as a `std::tuple` of "output stages", each of
 which defining an elementary arithmetic transformation.

 All available output stages are defined in
 [public/output_stages.h](../public/output_stages.h).

 ## Example usage

 The best part to see examples of using various output pipelines is in the unit
 test,

 ```
 test/test.cc
 ```

 specifically in this function:

 ```
 TestOutputStages
 ```

 Separately, a self-contained example showing how to use gemmlowp to compute a
 quantized matrix multiplication with a sounds quantization paradigm, is here:

 [doc/quantization_example.cc](quantization_example.cc)
	# Output pipelines in gemmlowp

	In gemmlowp, the "output pipeline" is the process that takes a final `int32`
	accumulator value (the output of the compute/kernel stage), and processes it to
	obtain the final value (typically a `uint8` value) and write it to the
	destination matrix.

	Gemmlowp has some genericity in what arithmetic transformations take place in
	the output pipeline, so as to allow different users to implement different
	quantization paradigms. See [low-precision.md](low-precision.md) and
	[quantization.md](quantization.md).

	Besides implementing a quantization paradigm, the other thing that output
	pipelines is good for, is implementing fused operations where a matrix
	multiplication feeds into other operations applied to its result, without
	additional array traversals. For instance, when implementing neural network
	inference, one might have a Convolutional layer with a bias-addition and an
	activation. One then wants to feed the result of the matrix multiplication
	implementing the Convolutional operator itself, directly into the bias-addition
	and activation function. gemmlowp's output pipelines allow implementing that:
	the bias-addition and activation function are just additional stages in the
	output pipeline.

	## Usage

	The gemmlowp entry point allowing to use an arbitrary output pipeline is
	`GemmWithOutputPipeline` in [public/gemmlowp.h](../public/gemmlowp.h).

	The output pipeline is specified as a `std::tuple` of "output stages", each of
	which defining an elementary arithmetic transformation.

	All available output stages are defined in
	[public/output_stages.h](../public/output_stages.h).

	## Example usage

	The best part to see examples of using various output pipelines is in the unit
	test,

	```
	test/test.cc
	```

	specifically in this function:

	```
	TestOutputStages
	```

	Separately, a self-contained example showing how to use gemmlowp to compute a
	quantized matrix multiplication with a sounds quantization paradigm, is here:

	[doc/quantization_example.cc](quantization_example.cc)