blob: efb1b9de2d1a760257c439dee175b283cabba9e9 [file] [log] [blame]
Inference Mode
==============
``c10::InferenceMode`` is a new RAII guard analogous to ``NoGradMode``
to be used when you are certain your operations will have no interactions
with autograd (e.g. model training). Compared to ``NoGradMode``, code run
under this mode gets better performance by disabling autograd related work like
view tracking and version counter bumps. However, tensors created inside
``c10::InferenceMode`` has more limitation when interacting with autograd system as well.
``InferenceMode`` can be enabled for a given block of code. Inside ``InferenceMode``
all newly allocated (non-view) tensors are marked as inference tensors. Inference tensors:
- do not have a version counter so an error will be raised if you try to read their version
(e.g., because you saved this tensor for backward).
- are immutable outside ``InferenceMode``. So an error will be raised if you try to:
- mutate their data outside InferenceMode.
- mutate them into ``requires_grad=True`` outside InferenceMode.
To work around you can make a clone outside ``InferenceMode`` to get a normal tensor before mutating.
A non-view tensor is an inference tensor if and only if it was allocated inside ``InferenceMode``.
A view tensor is an inference tensor if and only if the tensor it is a view of is an inference tensor.
Inside an ``InferenceMode`` block, we make the following performance guarantees:
- Like ``NoGradMode``, all operations do not record ``grad_fn`` even if their inputs have ``requires_grad=True``.
This applies to both inference tensors and normal tensors.
- View operations on inference tensors do not do view tracking. View and non-view inference tensors are
indistinguishable.
- Inplace operations on inference tensors are guaranteed not to do a version bump.
For more implementation details of ``InferenceMode`` please see the `RFC-0011-InferenceMode <https://github.com/pytorch/rfcs/pull/17>`_.
Migration guide from ``AutoNonVariableTypeMode``
------------------------------------------------
In production use of PyTorch for inference workload, we have seen a proliferation
of uses of the C++ guard ``AutoNonVariableTypeMode`` (now ``AutoDispatchBelowADInplaceOrView``),
which disables autograd, view tracking and version counter bumps. Unfortunately,
current colloquial of this guard for inference workload is unsafe: it's possible to
use ``AutoNonVariableTypeMode`` to bypass PyTorch's safety checks and result in
silently wrong results, e.g. PyTorch throws an error when tensors saved for backwards
are subsequently mutated, but mutation happens inside ``AutoNonVariableTypeMode`` will
silently bypass the check and returns wrong gradient to users.
When current users of ``AutoNonVariableTypeMode`` think about migrating, the following
steps might help you decide the best alternatives:
1. Users trying to run workload in inference only mode (like loading a pretrained JIT model and
run inference in C++ runtime) should add ``c10::InferenceMode guard`` to guard all operations
on tensors (including model loading). See an inference workload example below:
.. code-block:: cpp
c10::InferenceMode guard;
model.load_jit(saved_model);
auto inputs = preprocess_tensors(data);
auto out = model.forward(inputs);
auto outputs = postprocess_tensors(out);
Note ``c10::InferenceMode`` offers a drop in replacement for ``AutoNonVariableTypeMode`` which preserves
the performance characteristics of ``AutoNonVariableTypeMode``. But they also have some differences that
users should pay additional attention to:
- Both guards affects tensor execution process to skip work not related to inference, but ``InferenceMode``
also affects tensor creation while ``AutoNonVariableTypeMode`` doesn't. In other words, tensors created
inside ``InferenceMode`` are marked as inference tensors so that certain limitation can be applied after
exiting ``InferenceMode``.
- Enabled/disabled ``InferenceMode`` states can be nested while ``AutoNonVariableTypeMode`` only allows enabled state..
.. code-block:: cpp
{
InferenceMode guard(true);
// InferenceMode is on
{
InferenceMode guard(false);
// InferenceMode is off
}
// InferenceMode is on
}
// InferenceMode is off
2. Users trying to implement a customized kernel who wants to redispatch under ``Autograd`` dispatch
keys should use ``AutoDispatchBelowADInplaceOrView`` instead. Note ``AutoDispatchBelowADInplaceOrView`` is just a new name
of ``AutoNonVariableTypeMode`` since it explains the guard's functionality better. We're deprecating
``AutoNonVariableTypeMode`` and it'll be removed in 1.10 release. See customized kernel
``ROIAlignFunction`` in ``pytorch/vision`` for an example:
.. code-block:: cpp
class ROIAlignFunction : public torch::autograd::Function<ROIAlignFunction> {
public:
static torch::autograd::variable_list forward(
torch::autograd::AutogradContext* ctx,
const torch::autograd::Variable& input,
const torch::autograd::Variable& rois,
double spatial_scale,
int64_t pooled_height,
int64_t pooled_width,
int64_t sampling_ratio,
bool aligned) {
ctx->saved_data["spatial_scale"] = spatial_scale;
ctx->saved_data["pooled_height"] = pooled_height;
ctx->saved_data["pooled_width"] = pooled_width;
ctx->saved_data["sampling_ratio"] = sampling_ratio;
ctx->saved_data["aligned"] = aligned;
ctx->saved_data["input_shape"] = input.sizes();
ctx->save_for_backward({rois});
// Used to be at::AutoNonVariableTypeMode g;
at::AutoDispatchBelowADInplaceOrView guard;
auto result = roi_align(
input, rois, spatial_scale, pooled_height,
pooled_width, sampling_ratio, aligned);
return {result};
}
Customized inplace & view kernels need some special handling in addition to the guard above, see
`custom kernel tutorial <https://pytorch.org/tutorials/advanced/cpp_extension.html#backward-pass>`_
for more details.