docs/source/quantization-support.rst - platform/external/pytorch - Git at Google

 Quantization API Reference
 -------------------------------

 torch.quantization
 ~~~~~~~~~~~~~~~~~~~~~

 This module contains Eager mode quantization APIs.

 .. currentmodule:: torch.quantization

 Top level APIs
 ^^^^^^^^^^^^^^

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     quantize
     quantize_dynamic
     quantize_qat
     prepare
     prepare_qat
     convert

 Preparing model for quantization
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     fuse_modules
     QuantStub
     DeQuantStub
     QuantWrapper
     add_quant_dequant

 Utility functions
 ^^^^^^^^^^^^^^^^^

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     add_observer_
     swap_module
     propagate_qconfig_
     default_eval_fn
     get_observer_dict

 torch.quantization.quantize_fx
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 This module contains FX graph mode quantization APIs (prototype).

 .. currentmodule:: torch.quantization.quantize_fx

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     prepare_fx
     prepare_qat_fx
     convert_fx
     fuse_fx

 torch.ao.quantization.qconfig_mapping
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 This module contains QConfigMapping for configuring FX graph mode quantization.

 .. currentmodule:: torch.ao.quantization.qconfig_mapping

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     QConfigMapping
     get_default_qconfig_mapping
     get_default_qat_qconfig_mapping

 torch.ao.quantization.backend_config
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 This module contains BackendConfig, a config object that defines how quantization is supported
 in a backend. Currently only used by FX Graph Mode Quantization, but we may extend Eager Mode
 Quantization to work with this as well.

 .. currentmodule:: torch.ao.quantization.backend_config

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     BackendConfig
     BackendPatternConfig
     DTypeConfig
     ObservationType

 torch.ao.quantization.fx.custom_config
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 This module contains a few CustomConfig classes that's used in both eager mode and FX graph mode quantization


 .. currentmodule:: torch.ao.quantization.fx.custom_config

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     FuseCustomConfig
     PrepareCustomConfig
     ConvertCustomConfig
     StandaloneModuleConfigEntry

 torch (quantization related functions)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 This describes the quantization related functions of the `torch` namespace.

 .. currentmodule:: torch

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     quantize_per_tensor
     quantize_per_channel
     dequantize

 torch.Tensor (quantization related methods)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Quantized Tensors support a limited subset of data manipulation methods of the
 regular full-precision tensor.

 .. currentmodule:: torch.Tensor

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     view
     as_strided
     expand
     flatten
     select
     ne
     eq
     ge
     le
     gt
     lt
     copy_
     clone
     dequantize
     equal
     int_repr
     max
     mean
     min
     q_scale
     q_zero_point
     q_per_channel_scales
     q_per_channel_zero_points
     q_per_channel_axis
     resize_
     sort
     topk


 torch.quantization.observer
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 This module contains observers which are used to collect statistics about
 the values observed during calibration (PTQ) or training (QAT).

 .. currentmodule:: torch.quantization.observer

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     ObserverBase
     MinMaxObserver
     MovingAverageMinMaxObserver
     PerChannelMinMaxObserver
     MovingAveragePerChannelMinMaxObserver
     HistogramObserver
     PlaceholderObserver
     RecordingObserver
     NoopObserver
     get_observer_state_dict
     load_observer_state_dict
     default_observer
     default_placeholder_observer
     default_debug_observer
     default_weight_observer
     default_histogram_observer
     default_per_channel_weight_observer
     default_dynamic_quant_observer
     default_float_qparams_observer

 torch.quantization.fake_quantize
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 This module implements modules which are used to perform fake quantization
 during QAT.

 .. currentmodule:: torch.quantization.fake_quantize

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     FakeQuantizeBase
     FakeQuantize
     FixedQParamsFakeQuantize
     FusedMovingAvgObsFakeQuantize
     default_fake_quant
     default_weight_fake_quant
     default_per_channel_weight_fake_quant
     default_histogram_fake_quant
     default_fused_act_fake_quant
     default_fused_wt_fake_quant
     default_fused_per_channel_wt_fake_quant
     disable_fake_quant
     enable_fake_quant
     disable_observer
     enable_observer

 torch.quantization.qconfig
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 This module defines `QConfig` objects which are used
 to configure quantization settings for individual ops.

 .. currentmodule:: torch.quantization.qconfig

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     QConfig
     default_qconfig
     default_debug_qconfig
     default_per_channel_qconfig
     default_dynamic_qconfig
     float16_dynamic_qconfig
     float16_static_qconfig
     per_channel_dynamic_qconfig
     float_qparams_weight_only_qconfig
     default_qat_qconfig
     default_weight_only_qconfig
     default_activation_only_qconfig
     default_qat_qconfig_v2

 torch.nn.intrinsic
 ~~~~~~~~~~~~~~~~~~
 .. automodule:: torch.nn.intrinsic
 .. automodule:: torch.nn.intrinsic.modules

 This module implements the combined (fused) modules conv + relu which can
 then be quantized.

 .. currentmodule:: torch.nn.intrinsic

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     ConvReLU1d
     ConvReLU2d
     ConvReLU3d
     LinearReLU
     ConvBn1d
     ConvBn2d
     ConvBn3d
     ConvBnReLU1d
     ConvBnReLU2d
     ConvBnReLU3d
     BNReLU2d
     BNReLU3d

 torch.nn.intrinsic.qat
 ~~~~~~~~~~~~~~~~~~~~~~
 .. automodule:: torch.nn.intrinsic.qat
 .. automodule:: torch.nn.intrinsic.qat.modules


 This module implements the versions of those fused operations needed for
 quantization aware training.

 .. currentmodule:: torch.nn.intrinsic.qat

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     LinearReLU
     ConvBn1d
     ConvBnReLU1d
     ConvBn2d
     ConvBnReLU2d
     ConvReLU2d
     ConvBn3d
     ConvBnReLU3d
     ConvReLU3d
     update_bn_stats
     freeze_bn_stats

 torch.nn.intrinsic.quantized
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. automodule:: torch.nn.intrinsic.quantized
 .. automodule:: torch.nn.intrinsic.quantized.modules


 This module implements the quantized implementations of fused operations
 like conv + relu. No BatchNorm variants as it's usually folded into convolution
 for inference.

 .. currentmodule:: torch.nn.intrinsic.quantized

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     BNReLU2d
     BNReLU3d
     ConvReLU1d
     ConvReLU2d
     ConvReLU3d
     LinearReLU

 torch.nn.intrinsic.quantized.dynamic
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. automodule:: torch.nn.intrinsic.quantized.dynamic
 .. automodule:: torch.nn.intrinsic.quantized.dynamic.modules

 This module implements the quantized dynamic implementations of fused operations
 like linear + relu.

 .. currentmodule:: torch.nn.intrinsic.quantized.dynamic

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     LinearReLU

 torch.ao.nn.qat
 ~~~~~~~~~~~~~~~~~~~~~~
 .. automodule:: torch.ao.nn.qat
 .. automodule:: torch.ao.nn.qat.modules

 This module implements versions of the key nn modules **Conv2d()** and
 **Linear()** which run in FP32 but with rounding applied to simulate the
 effect of INT8 quantization.

 .. currentmodule:: torch.ao.nn.qat

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     Conv2d
     Conv3d
     Linear

 torch.ao.nn.qat.dynamic
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. automodule:: torch.ao.nn.qat.dynamic
 .. automodule:: torch.ao.nn.qat.dynamic.modules

 This module implements versions of the key nn modules such as **Linear()**
 which run in FP32 but with rounding applied to simulate the effect of INT8
 quantization and will be dynamically quantized during inference.

 .. currentmodule:: torch.ao.nn.qat.dynamic

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     Linear

 torch.ao.nn.quantized
 ~~~~~~~~~~~~~~~~~~~~~~
 .. automodule:: torch.ao.nn.quantized
    :noindex:
 .. automodule:: torch.ao.nn.quantized.modules

 This module implements the quantized versions of the nn layers such as
 ~`torch.nn.Conv2d` and `torch.nn.ReLU`.

 .. currentmodule:: torch.ao.nn.quantized

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     ReLU6
     Hardswish
     ELU
     LeakyReLU
     Sigmoid
     BatchNorm2d
     BatchNorm3d
     Conv1d
     Conv2d
     Conv3d
     ConvTranspose1d
     ConvTranspose2d
     ConvTranspose3d
     Embedding
     EmbeddingBag
     FloatFunctional
     FXFloatFunctional
     QFunctional
     Linear
     LayerNorm
     GroupNorm
     InstanceNorm1d
     InstanceNorm2d
     InstanceNorm3d

 torch.ao.nn.quantized.functional
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. automodule:: torch.ao.nn.quantized.functional

 This module implements the quantized versions of the functional layers such as
 ~`torch.nn.functional.conv2d` and `torch.nn.functional.relu`. Note:
 :meth:`~torch.nn.functional.relu` supports quantized inputs.

 .. currentmodule:: torch.ao.nn.quantized.functional

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     avg_pool2d
     avg_pool3d
     adaptive_avg_pool2d
     adaptive_avg_pool3d
     conv1d
     conv2d
     conv3d
     interpolate
     linear
     max_pool1d
     max_pool2d
     celu
     leaky_relu
     hardtanh
     hardswish
     threshold
     elu
     hardsigmoid
     clamp
     upsample
     upsample_bilinear
     upsample_nearest

 torch.nn.quantizable
 ~~~~~~~~~~~~~~~~~~~~

 This module implements the quantizable versions of some of the nn layers.
 These modules can be used in conjunction with the custom module mechanism,
 by providing the ``custom_module_config`` argument to both prepare and convert.

 .. currentmodule:: torch.nn.quantizable

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     LSTM
     MultiheadAttention


 torch.ao.nn.quantized.dynamic
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. automodule:: torch.ao.nn.quantized.dynamic
 .. automodule:: torch.ao.nn.quantized.dynamic.modules

 Dynamically quantized :class:`~torch.nn.Linear`, :class:`~torch.nn.LSTM`,
 :class:`~torch.nn.LSTMCell`, :class:`~torch.nn.GRUCell`, and
 :class:`~torch.nn.RNNCell`.

 .. currentmodule:: torch.ao.nn.quantized.dynamic

 .. autosummary::
     :toctree: generated
     :nosignatures:
     :template: classtemplate.rst

     Linear
     LSTM
     GRU
     RNNCell
     LSTMCell
     GRUCell

 Quantized dtypes and quantization schemes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Note that operator implementations currently only
 support per channel quantization for weights of the **conv** and **linear**
 operators. Furthermore, the input data is
 mapped linearly to the the quantized data and vice versa
 as follows:

     .. math::

         \begin{aligned}
             \text{Quantization:}&\\
             &Q_\text{out} = \text{clamp}(x_\text{input}/s+z, Q_\text{min}, Q_\text{max})\\
             \text{Dequantization:}&\\
             &x_\text{out} = (Q_\text{input}-z)*s
         \end{aligned}

 where :math:`\text{clamp}(.)` is the same as :func:`~torch.clamp` while the
 scale :math:`s` and zero point :math:`z` are then computed
 as decribed in :class:`~torch.ao.quantization.observer.MinMaxObserver`, specifically:

     .. math::

         \begin{aligned}
             \text{if Symmetric:}&\\
             &s = 2 \max(|x_\text{min}|, x_\text{max}) /
                 \left( Q_\text{max} - Q_\text{min} \right) \\
             &z = \begin{cases}
                 0 & \text{if dtype is qint8} \\
                 128 & \text{otherwise}
             \end{cases}\\
             \text{Otherwise:}&\\
                 &s = \left( x_\text{max} - x_\text{min}  \right ) /
                     \left( Q_\text{max} - Q_\text{min} \right ) \\
                 &z = Q_\text{min} - \text{round}(x_\text{min} / s)
         \end{aligned}

 where :math:`[x_\text{min}, x_\text{max}]` denotes the range of the input data while
 :math:`Q_\text{min}` and :math:`Q_\text{max}` are respectively the minimum and maximum values of the quantized dtype.

 Note that the choice of :math:`s` and :math:`z` implies that zero is represented with no quantization error whenever zero is within
 the range of the input data or symmetric quantization is being used.

 Additional data types and quantization schemes can be implemented through
 the `custom operator mechanism <https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html>`_.

 * :attr:`torch.qscheme` — Type to describe the quantization scheme of a tensor.
   Supported types:

   * :attr:`torch.per_tensor_affine` — per tensor, asymmetric
   * :attr:`torch.per_channel_affine` — per channel, asymmetric
   * :attr:`torch.per_tensor_symmetric` — per tensor, symmetric
   * :attr:`torch.per_channel_symmetric` — per channel, symmetric

 * ``torch.dtype`` — Type to describe the data. Supported types:

   * :attr:`torch.quint8` — 8-bit unsigned integer
   * :attr:`torch.qint8` — 8-bit signed integer
   * :attr:`torch.qint32` — 32-bit signed integer


 .. These modules are missing docs. Adding them here only for tracking
 .. automodule:: torch.nn.quantizable
 .. automodule:: torch.nn.quantizable.modules
 .. automodule:: torch.nn.quantized
    :noindex:

 .. automodule:: torch.ao.nn.quantized.reference
    :noindex:
 .. automodule:: torch.ao.nn.quantized.reference.modules
    :noindex:
	Quantization API Reference
	-------------------------------

	torch.quantization
	~~~~~~~~~~~~~~~~~~~~~

	This module contains Eager mode quantization APIs.

	.. currentmodule:: torch.quantization

	Top level APIs
	^^^^^^^^^^^^^^

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	quantize
	quantize_dynamic
	quantize_qat
	prepare
	prepare_qat
	convert

	Preparing model for quantization
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	fuse_modules
	QuantStub
	DeQuantStub
	QuantWrapper
	add_quant_dequant

	Utility functions
	^^^^^^^^^^^^^^^^^

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	add_observer_
	swap_module
	propagate_qconfig_
	default_eval_fn
	get_observer_dict

	torch.quantization.quantize_fx
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	This module contains FX graph mode quantization APIs (prototype).

	.. currentmodule:: torch.quantization.quantize_fx

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	prepare_fx
	prepare_qat_fx
	convert_fx
	fuse_fx

	torch.ao.quantization.qconfig_mapping
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	This module contains QConfigMapping for configuring FX graph mode quantization.

	.. currentmodule:: torch.ao.quantization.qconfig_mapping

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	QConfigMapping
	get_default_qconfig_mapping
	get_default_qat_qconfig_mapping

	torch.ao.quantization.backend_config
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	This module contains BackendConfig, a config object that defines how quantization is supported
	in a backend. Currently only used by FX Graph Mode Quantization, but we may extend Eager Mode
	Quantization to work with this as well.

	.. currentmodule:: torch.ao.quantization.backend_config

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	BackendConfig
	BackendPatternConfig
	DTypeConfig
	ObservationType

	torch.ao.quantization.fx.custom_config
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	This module contains a few CustomConfig classes that's used in both eager mode and FX graph mode quantization


	.. currentmodule:: torch.ao.quantization.fx.custom_config

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	FuseCustomConfig
	PrepareCustomConfig
	ConvertCustomConfig
	StandaloneModuleConfigEntry

	torch (quantization related functions)
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	This describes the quantization related functions of the `torch` namespace.

	.. currentmodule:: torch

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	quantize_per_tensor
	quantize_per_channel
	dequantize

	torch.Tensor (quantization related methods)
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Quantized Tensors support a limited subset of data manipulation methods of the
	regular full-precision tensor.

	.. currentmodule:: torch.Tensor

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	view
	as_strided
	expand
	flatten
	select
	ne
	eq
	ge
	le
	gt
	lt
	copy_
	clone
	dequantize
	equal
	int_repr
	max
	mean
	min
	q_scale
	q_zero_point
	q_per_channel_scales
	q_per_channel_zero_points
	q_per_channel_axis
	resize_
	sort
	topk


	torch.quantization.observer
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	This module contains observers which are used to collect statistics about
	the values observed during calibration (PTQ) or training (QAT).

	.. currentmodule:: torch.quantization.observer

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	ObserverBase
	MinMaxObserver
	MovingAverageMinMaxObserver
	PerChannelMinMaxObserver
	MovingAveragePerChannelMinMaxObserver
	HistogramObserver
	PlaceholderObserver
	RecordingObserver
	NoopObserver
	get_observer_state_dict
	load_observer_state_dict
	default_observer
	default_placeholder_observer
	default_debug_observer
	default_weight_observer
	default_histogram_observer
	default_per_channel_weight_observer
	default_dynamic_quant_observer
	default_float_qparams_observer

	torch.quantization.fake_quantize
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	This module implements modules which are used to perform fake quantization
	during QAT.

	.. currentmodule:: torch.quantization.fake_quantize

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	FakeQuantizeBase
	FakeQuantize
	FixedQParamsFakeQuantize
	FusedMovingAvgObsFakeQuantize
	default_fake_quant
	default_weight_fake_quant
	default_per_channel_weight_fake_quant
	default_histogram_fake_quant
	default_fused_act_fake_quant
	default_fused_wt_fake_quant
	default_fused_per_channel_wt_fake_quant
	disable_fake_quant
	enable_fake_quant
	disable_observer
	enable_observer

	torch.quantization.qconfig
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	This module defines `QConfig` objects which are used
	to configure quantization settings for individual ops.

	.. currentmodule:: torch.quantization.qconfig

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	QConfig
	default_qconfig
	default_debug_qconfig
	default_per_channel_qconfig
	default_dynamic_qconfig
	float16_dynamic_qconfig
	float16_static_qconfig
	per_channel_dynamic_qconfig
	float_qparams_weight_only_qconfig
	default_qat_qconfig
	default_weight_only_qconfig
	default_activation_only_qconfig
	default_qat_qconfig_v2

	torch.nn.intrinsic
	~~~~~~~~~~~~~~~~~~
	.. automodule:: torch.nn.intrinsic
	.. automodule:: torch.nn.intrinsic.modules

	This module implements the combined (fused) modules conv + relu which can
	then be quantized.

	.. currentmodule:: torch.nn.intrinsic

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	ConvReLU1d
	ConvReLU2d
	ConvReLU3d
	LinearReLU
	ConvBn1d
	ConvBn2d
	ConvBn3d
	ConvBnReLU1d
	ConvBnReLU2d
	ConvBnReLU3d
	BNReLU2d
	BNReLU3d

	torch.nn.intrinsic.qat
	~~~~~~~~~~~~~~~~~~~~~~
	.. automodule:: torch.nn.intrinsic.qat
	.. automodule:: torch.nn.intrinsic.qat.modules


	This module implements the versions of those fused operations needed for
	quantization aware training.

	.. currentmodule:: torch.nn.intrinsic.qat

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	LinearReLU
	ConvBn1d
	ConvBnReLU1d
	ConvBn2d
	ConvBnReLU2d
	ConvReLU2d
	ConvBn3d
	ConvBnReLU3d
	ConvReLU3d
	update_bn_stats
	freeze_bn_stats

	torch.nn.intrinsic.quantized
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	.. automodule:: torch.nn.intrinsic.quantized
	.. automodule:: torch.nn.intrinsic.quantized.modules


	This module implements the quantized implementations of fused operations
	like conv + relu. No BatchNorm variants as it's usually folded into convolution
	for inference.

	.. currentmodule:: torch.nn.intrinsic.quantized

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	BNReLU2d
	BNReLU3d
	ConvReLU1d
	ConvReLU2d
	ConvReLU3d
	LinearReLU

	torch.nn.intrinsic.quantized.dynamic
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	.. automodule:: torch.nn.intrinsic.quantized.dynamic
	.. automodule:: torch.nn.intrinsic.quantized.dynamic.modules

	This module implements the quantized dynamic implementations of fused operations
	like linear + relu.

	.. currentmodule:: torch.nn.intrinsic.quantized.dynamic

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	LinearReLU

	torch.ao.nn.qat
	~~~~~~~~~~~~~~~~~~~~~~
	.. automodule:: torch.ao.nn.qat
	.. automodule:: torch.ao.nn.qat.modules

	This module implements versions of the key nn modules Conv2d() and
	Linear() which run in FP32 but with rounding applied to simulate the
	effect of INT8 quantization.

	.. currentmodule:: torch.ao.nn.qat

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	Conv2d
	Conv3d
	Linear

	torch.ao.nn.qat.dynamic
	~~~~~~~~~~~~~~~~~~~~~~~~~~
	.. automodule:: torch.ao.nn.qat.dynamic
	.. automodule:: torch.ao.nn.qat.dynamic.modules

	This module implements versions of the key nn modules such as Linear()
	which run in FP32 but with rounding applied to simulate the effect of INT8
	quantization and will be dynamically quantized during inference.

	.. currentmodule:: torch.ao.nn.qat.dynamic

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	Linear

	torch.ao.nn.quantized
	~~~~~~~~~~~~~~~~~~~~~~
	.. automodule:: torch.ao.nn.quantized
	:noindex:
	.. automodule:: torch.ao.nn.quantized.modules

	This module implements the quantized versions of the nn layers such as
	~`torch.nn.Conv2d` and `torch.nn.ReLU`.

	.. currentmodule:: torch.ao.nn.quantized

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	ReLU6
	Hardswish
	ELU
	LeakyReLU
	Sigmoid
	BatchNorm2d
	BatchNorm3d
	Conv1d
	Conv2d
	Conv3d
	ConvTranspose1d
	ConvTranspose2d
	ConvTranspose3d
	Embedding
	EmbeddingBag
	FloatFunctional
	FXFloatFunctional
	QFunctional
	Linear
	LayerNorm
	GroupNorm
	InstanceNorm1d
	InstanceNorm2d
	InstanceNorm3d

	torch.ao.nn.quantized.functional
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	.. automodule:: torch.ao.nn.quantized.functional

	This module implements the quantized versions of the functional layers such as
	~`torch.nn.functional.conv2d` and `torch.nn.functional.relu`. Note:
	:meth:`~torch.nn.functional.relu` supports quantized inputs.

	.. currentmodule:: torch.ao.nn.quantized.functional

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	avg_pool2d
	avg_pool3d
	adaptive_avg_pool2d
	adaptive_avg_pool3d
	conv1d
	conv2d
	conv3d
	interpolate
	linear
	max_pool1d
	max_pool2d
	celu
	leaky_relu
	hardtanh
	hardswish
	threshold
	elu
	hardsigmoid
	clamp
	upsample
	upsample_bilinear
	upsample_nearest

	torch.nn.quantizable
	~~~~~~~~~~~~~~~~~~~~

	This module implements the quantizable versions of some of the nn layers.
	These modules can be used in conjunction with the custom module mechanism,
	by providing the ``custom_module_config`` argument to both prepare and convert.

	.. currentmodule:: torch.nn.quantizable

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	LSTM
	MultiheadAttention


	torch.ao.nn.quantized.dynamic
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	.. automodule:: torch.ao.nn.quantized.dynamic
	.. automodule:: torch.ao.nn.quantized.dynamic.modules

	Dynamically quantized :class:`~torch.nn.Linear`, :class:`~torch.nn.LSTM`,
	:class:`~torch.nn.LSTMCell`, :class:`~torch.nn.GRUCell`, and
	:class:`~torch.nn.RNNCell`.

	.. currentmodule:: torch.ao.nn.quantized.dynamic

	.. autosummary::
	:toctree: generated
	:nosignatures:
	:template: classtemplate.rst

	Linear
	LSTM
	GRU
	RNNCell
	LSTMCell
	GRUCell

	Quantized dtypes and quantization schemes
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Note that operator implementations currently only
	support per channel quantization for weights of the conv and linear
	operators. Furthermore, the input data is
	mapped linearly to the the quantized data and vice versa
	as follows:

	.. math::

	\begin{aligned}
	\text{Quantization:}&\\
	&Q_\text{out} = \text{clamp}(x_\text{input}/s+z, Q_\text{min}, Q_\text{max})\\
	\text{Dequantization:}&\\
	&x_\text{out} = (Q_\text{input}-z)*s
	\end{aligned}

	where :math:`\text{clamp}(.)` is the same as :func:`~torch.clamp` while the
	scale :math:`s` and zero point :math:`z` are then computed
	as decribed in :class:`~torch.ao.quantization.observer.MinMaxObserver`, specifically:

	.. math::

	\begin{aligned}
	\text{if Symmetric:}&\\
	&s = 2 \max(\|x_\text{min}\|, x_\text{max}) /
	\left( Q_\text{max} - Q_\text{min} \right) \\
	&z = \begin{cases}
	0 & \text{if dtype is qint8} \\
	128 & \text{otherwise}
	\end{cases}\\
	\text{Otherwise:}&\\
	&s = \left( x_\text{max} - x_\text{min} \right ) /
	\left( Q_\text{max} - Q_\text{min} \right ) \\
	&z = Q_\text{min} - \text{round}(x_\text{min} / s)
	\end{aligned}

	where :math:`[x_\text{min}, x_\text{max}]` denotes the range of the input data while
	:math:`Q_\text{min}` and :math:`Q_\text{max}` are respectively the minimum and maximum values of the quantized dtype.

	Note that the choice of :math:`s` and :math:`z` implies that zero is represented with no quantization error whenever zero is within
	the range of the input data or symmetric quantization is being used.

	Additional data types and quantization schemes can be implemented through
	the `custom operator mechanism <https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html>`_.

	* :attr:`torch.qscheme` — Type to describe the quantization scheme of a tensor.
	Supported types:

	* :attr:`torch.per_tensor_affine` — per tensor, asymmetric
	* :attr:`torch.per_channel_affine` — per channel, asymmetric
	* :attr:`torch.per_tensor_symmetric` — per tensor, symmetric
	* :attr:`torch.per_channel_symmetric` — per channel, symmetric

	* ``torch.dtype`` — Type to describe the data. Supported types:

	* :attr:`torch.quint8` — 8-bit unsigned integer
	* :attr:`torch.qint8` — 8-bit signed integer
	* :attr:`torch.qint32` — 32-bit signed integer


	.. These modules are missing docs. Adding them here only for tracking
	.. automodule:: torch.nn.quantizable
	.. automodule:: torch.nn.quantizable.modules
	.. automodule:: torch.nn.quantized
	:noindex:

	.. automodule:: torch.ao.nn.quantized.reference
	:noindex:
	.. automodule:: torch.ao.nn.quantized.reference.modules
	:noindex: