docs/source/notes/randomness.rst - platform/external/pytorch - Git at Google

 .. _reproducibility:

 Reproducibility
 ===============

 Completely reproducible results are not guaranteed across PyTorch releases,
 individual commits, or different platforms. Furthermore, results may not be
 reproducible between CPU and GPU executions, even when using identical seeds.

 However, there are some steps you can take to limit the number of sources of
 nondeterministic behavior for a specific platform, device, and PyTorch release.
 First, you can control sources of randomness that can cause multiple executions
 of your application to behave differently. Second, you can configure PyTorch
 to avoid using nondeterministic algorithms for some operations, so that multiple
 calls to those operations, given the same inputs, will produce the same result.

 .. warning::

     Deterministic operations are often slower than nondeterministic operations, so
     single-run performance may decrease for your model. However, determinism may
     save time in development by facilitating experimentation, debugging, and
     regression testing.

 Controlling sources of randomness
 .................................

 PyTorch random number generator
 -------------------------------
 You can use :meth:`torch.manual_seed()` to seed the RNG for all devices (both
 CPU and CUDA)::

     import torch
     torch.manual_seed(0)

 Some PyTorch operations may use random numbers internally.
 :meth:`torch.svd_lowrank()` does this, for instance. Consequently, calling it
 multiple times back-to-back with the same input arguments may give different
 results. However, as long as :meth:`torch.manual_seed()` is set to a constant
 at the beginning of an application and all other sources of nondeterminism have
 been eliminated, the same series of random numbers will be generated each time
 the application is run in the same environment.

 It is also possible to obtain identical results from an operation that uses
 random numbers by setting :meth:`torch.manual_seed()` to the same value between
 subsequent calls.

 Python
 ------

 For custom operators, you might need to set python seed as well::

     import random
     random.seed(0)

 Random number generators in other libraries
 -------------------------------------------
 If you or any of the libraries you are using rely on NumPy, you can seed the global
 NumPy RNG with::

     import numpy as np
     np.random.seed(0)

 However, some applications and libraries may use NumPy Random Generator objects,
 not the global RNG
 (`<https://numpy.org/doc/stable/reference/random/generator.html>`_), and those will
 need to be seeded consistently as well.

 If you are using any other libraries that use random number generators, refer to
 the documentation for those libraries to see how to set consistent seeds for them.

 CUDA convolution benchmarking
 -----------------------------
 The cuDNN library, used by CUDA convolution operations, can be a source of nondeterminism
 across multiple executions of an application. When a cuDNN convolution is called with a
 new set of size parameters, an optional feature can run multiple convolution algorithms,
 benchmarking them to find the fastest one. Then, the fastest algorithm will be used
 consistently during the rest of the process for the corresponding set of size parameters.
 Due to benchmarking noise and different hardware, the benchmark may select different
 algorithms on subsequent runs, even on the same machine.

 Disabling the benchmarking feature with :code:`torch.backends.cudnn.benchmark = False`
 causes cuDNN to deterministically select an algorithm, possibly at the cost of reduced
 performance.

 However, if you do not need reproducibility across multiple executions of your application,
 then performance might improve if the benchmarking feature is enabled with
 :code:`torch.backends.cudnn.benchmark = True`.

 Note that this setting is different from the :code:`torch.backends.cudnn.deterministic`
 setting discussed below.

 Avoiding nondeterministic algorithms
 ....................................
 :meth:`torch.use_deterministic_algorithms` lets you configure PyTorch to use
 deterministic algorithms instead of nondeterministic ones where available, and
 to throw an error if an operation is known to be nondeterministic (and without
 a deterministic alternative).

 Please check the documentation for :meth:`torch.use_deterministic_algorithms()`
 for a full list of affected operations. If an operation does not act correctly
 according to the documentation, or if you need a deterministic implementation
 of an operation that does not have one, please submit an issue:
 `<https://github.com/pytorch/pytorch/issues?q=label:%22module:%20determinism%22>`_

 For example, running the nondeterministic CUDA implementation of :meth:`torch.Tensor.index_add_`
 will throw an error::

     >>> import torch
     >>> torch.use_deterministic_algorithms(True)
     >>> torch.randn(2, 2).cuda().index_add_(0, torch.tensor([0, 1]), torch.randn(2, 2))
     Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     RuntimeError: index_add_cuda_ does not have a deterministic implementation, but you set
     'torch.use_deterministic_algorithms(True)'. ...

 When :meth:`torch.bmm` is called with sparse-dense CUDA tensors it typically uses a
 nondeterministic algorithm, but when the deterministic flag is turned on, its alternate
 deterministic implementation will be used::

     >>> import torch
     >>> torch.use_deterministic_algorithms(True)
     >>> torch.bmm(torch.randn(2, 2, 2).to_sparse().cuda(), torch.randn(2, 2, 2).cuda())
     tensor([[[ 1.1900, -2.3409],
              [ 0.4796,  0.8003]],
             [[ 0.1509,  1.8027],
              [ 0.0333, -1.1444]]], device='cuda:0')

 Furthermore, if you are using CUDA tensors, and your CUDA version is 10.2 or greater, you
 should set the environment variable `CUBLAS_WORKSPACE_CONFIG` according to CUDA documentation:
 `<https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility>`_

 CUDA convolution determinism
 ----------------------------
 While disabling CUDA convolution benchmarking (discussed above) ensures that
 CUDA selects the same algorithm each time an application is run, that algorithm
 itself may be nondeterministic, unless either
 :code:`torch.use_deterministic_algorithms(True)` or
 :code:`torch.backends.cudnn.deterministic = True` is set. The latter setting
 controls only this behavior, unlike :meth:`torch.use_deterministic_algorithms`
 which will make other PyTorch operations behave deterministically, too.

 CUDA RNN and LSTM
 -----------------
 In some versions of CUDA, RNNs and LSTM networks may have non-deterministic behavior.
 See :meth:`torch.nn.RNN` and :meth:`torch.nn.LSTM` for details and workarounds.

 Filling uninitialized memory
 ----------------------------
 Operations like :meth:`torch.empty` and :meth:`torch.Tensor.resize_` can return
 tensors with uninitialized memory that contain undefined values. Using such a
 tensor as an input to another operation is invalid if determinism is required,
 because the output will be nondeterministic. But there is nothing to actually
 prevent such invalid code from being run. So for safety,
 :attr:`torch.utils.deterministic.fill_uninitialized_memory` is set to ``True``
 by default, which will fill the uninitialized memory with a known value if
 :code:`torch.use_deterministic_algorithms(True)` is set. This will to prevent
 the possibility of this kind of nondeterministic behavior.

 However, filling uninitialized memory is detrimental to performance. So if your
 program is valid and does not use uninitialized memory as the input to an
 operation, then this setting can be turned off for better performance.

 DataLoader
 ..........

 DataLoader will reseed workers following :ref:`data-loading-randomness` algorithm.
 Use :meth:`worker_init_fn` and `generator` to preserve reproducibility::

     def seed_worker(worker_id):
         worker_seed = torch.initial_seed() % 2**32
         numpy.random.seed(worker_seed)
         random.seed(worker_seed)

     g = torch.Generator()
     g.manual_seed(0)

     DataLoader(
         train_dataset,
         batch_size=batch_size,
         num_workers=num_workers,
         worker_init_fn=seed_worker,
         generator=g,
     )
	.. _reproducibility:

	Reproducibility
	===============

	Completely reproducible results are not guaranteed across PyTorch releases,
	individual commits, or different platforms. Furthermore, results may not be
	reproducible between CPU and GPU executions, even when using identical seeds.

	However, there are some steps you can take to limit the number of sources of
	nondeterministic behavior for a specific platform, device, and PyTorch release.
	First, you can control sources of randomness that can cause multiple executions
	of your application to behave differently. Second, you can configure PyTorch
	to avoid using nondeterministic algorithms for some operations, so that multiple
	calls to those operations, given the same inputs, will produce the same result.

	.. warning::

	Deterministic operations are often slower than nondeterministic operations, so
	single-run performance may decrease for your model. However, determinism may
	save time in development by facilitating experimentation, debugging, and
	regression testing.

	Controlling sources of randomness
	.................................

	PyTorch random number generator
	-------------------------------
	You can use :meth:`torch.manual_seed()` to seed the RNG for all devices (both
	CPU and CUDA)::

	import torch
	torch.manual_seed(0)

	Some PyTorch operations may use random numbers internally.
	:meth:`torch.svd_lowrank()` does this, for instance. Consequently, calling it
	multiple times back-to-back with the same input arguments may give different
	results. However, as long as :meth:`torch.manual_seed()` is set to a constant
	at the beginning of an application and all other sources of nondeterminism have
	been eliminated, the same series of random numbers will be generated each time
	the application is run in the same environment.

	It is also possible to obtain identical results from an operation that uses
	random numbers by setting :meth:`torch.manual_seed()` to the same value between
	subsequent calls.

	Python
	------

	For custom operators, you might need to set python seed as well::

	import random
	random.seed(0)

	Random number generators in other libraries
	-------------------------------------------
	If you or any of the libraries you are using rely on NumPy, you can seed the global
	NumPy RNG with::

	import numpy as np
	np.random.seed(0)

	However, some applications and libraries may use NumPy Random Generator objects,
	not the global RNG
	(`<https://numpy.org/doc/stable/reference/random/generator.html>`_), and those will
	need to be seeded consistently as well.

	If you are using any other libraries that use random number generators, refer to
	the documentation for those libraries to see how to set consistent seeds for them.

	CUDA convolution benchmarking
	-----------------------------
	The cuDNN library, used by CUDA convolution operations, can be a source of nondeterminism
	across multiple executions of an application. When a cuDNN convolution is called with a
	new set of size parameters, an optional feature can run multiple convolution algorithms,
	benchmarking them to find the fastest one. Then, the fastest algorithm will be used
	consistently during the rest of the process for the corresponding set of size parameters.
	Due to benchmarking noise and different hardware, the benchmark may select different
	algorithms on subsequent runs, even on the same machine.

	Disabling the benchmarking feature with :code:`torch.backends.cudnn.benchmark = False`
	causes cuDNN to deterministically select an algorithm, possibly at the cost of reduced
	performance.

	However, if you do not need reproducibility across multiple executions of your application,
	then performance might improve if the benchmarking feature is enabled with
	:code:`torch.backends.cudnn.benchmark = True`.

	Note that this setting is different from the :code:`torch.backends.cudnn.deterministic`
	setting discussed below.

	Avoiding nondeterministic algorithms
	....................................
	:meth:`torch.use_deterministic_algorithms` lets you configure PyTorch to use
	deterministic algorithms instead of nondeterministic ones where available, and
	to throw an error if an operation is known to be nondeterministic (and without
	a deterministic alternative).

	Please check the documentation for :meth:`torch.use_deterministic_algorithms()`
	for a full list of affected operations. If an operation does not act correctly
	according to the documentation, or if you need a deterministic implementation
	of an operation that does not have one, please submit an issue:
	`<https://github.com/pytorch/pytorch/issues?q=label:%22module:%20determinism%22>`_

	For example, running the nondeterministic CUDA implementation of :meth:`torch.Tensor.index_add_`
	will throw an error::

	>>> import torch
	>>> torch.use_deterministic_algorithms(True)
	>>> torch.randn(2, 2).cuda().index_add_(0, torch.tensor([0, 1]), torch.randn(2, 2))
	Traceback (most recent call last):
	File "<stdin>", line 1, in <module>
	RuntimeError: index_add_cuda_ does not have a deterministic implementation, but you set
	'torch.use_deterministic_algorithms(True)'. ...

	When :meth:`torch.bmm` is called with sparse-dense CUDA tensors it typically uses a
	nondeterministic algorithm, but when the deterministic flag is turned on, its alternate
	deterministic implementation will be used::

	>>> import torch
	>>> torch.use_deterministic_algorithms(True)
	>>> torch.bmm(torch.randn(2, 2, 2).to_sparse().cuda(), torch.randn(2, 2, 2).cuda())
	tensor([[[ 1.1900, -2.3409],
	[ 0.4796, 0.8003]],
	[[ 0.1509, 1.8027],
	[ 0.0333, -1.1444]]], device='cuda:0')

	Furthermore, if you are using CUDA tensors, and your CUDA version is 10.2 or greater, you
	should set the environment variable `CUBLAS_WORKSPACE_CONFIG` according to CUDA documentation:
	`<https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility>`_

	CUDA convolution determinism
	----------------------------
	While disabling CUDA convolution benchmarking (discussed above) ensures that
	CUDA selects the same algorithm each time an application is run, that algorithm
	itself may be nondeterministic, unless either
	:code:`torch.use_deterministic_algorithms(True)` or
	:code:`torch.backends.cudnn.deterministic = True` is set. The latter setting
	controls only this behavior, unlike :meth:`torch.use_deterministic_algorithms`
	which will make other PyTorch operations behave deterministically, too.

	CUDA RNN and LSTM
	-----------------
	In some versions of CUDA, RNNs and LSTM networks may have non-deterministic behavior.
	See :meth:`torch.nn.RNN` and :meth:`torch.nn.LSTM` for details and workarounds.

	Filling uninitialized memory
	----------------------------
	Operations like :meth:`torch.empty` and :meth:`torch.Tensor.resize_` can return
	tensors with uninitialized memory that contain undefined values. Using such a
	tensor as an input to another operation is invalid if determinism is required,
	because the output will be nondeterministic. But there is nothing to actually
	prevent such invalid code from being run. So for safety,
	:attr:`torch.utils.deterministic.fill_uninitialized_memory` is set to ``True``
	by default, which will fill the uninitialized memory with a known value if
	:code:`torch.use_deterministic_algorithms(True)` is set. This will to prevent
	the possibility of this kind of nondeterministic behavior.

	However, filling uninitialized memory is detrimental to performance. So if your
	program is valid and does not use uninitialized memory as the input to an
	operation, then this setting can be turned off for better performance.

	DataLoader
	..........

	DataLoader will reseed workers following :ref:`data-loading-randomness` algorithm.
	Use :meth:`worker_init_fn` and `generator` to preserve reproducibility::

	def seed_worker(worker_id):
	worker_seed = torch.initial_seed() % 2**32
	numpy.random.seed(worker_seed)
	random.seed(worker_seed)

	g = torch.Generator()
	g.manual_seed(0)

	DataLoader(
	train_dataset,
	batch_size=batch_size,
	num_workers=num_workers,
	worker_init_fn=seed_worker,
	generator=g,
	)