docs/source/cuda._sanitizer.rst - platform/external/pytorch - Git at Google

 .. currentmodule:: torch.cuda._sanitizer

 CUDA Stream Sanitizer
 =====================

 .. note::
     This is a prototype feature, which means it is at an early stage
     for feedback and testing, and its components are subject to change.

 Overview
 --------

 .. automodule:: torch.cuda._sanitizer


 Usage
 ------

 Here is an example of a simple synchronization error in PyTorch:

 ::

     import torch

     a = torch.rand(4, 2, device="cuda")

     with torch.cuda.stream(torch.cuda.Stream()):
         torch.mul(a, 5, out=a)

 The ``a`` tensor is initialized on the default stream and, without any synchronization
 methods, modified on a new stream. The two kernels will run concurrently on the same tensor,
 which might cause the second kernel to read uninitialized data before the first one was able
 to write it, or the first kernel might overwrite part of the result of the second.
 When this script is run on the commandline with:
 ::

     TORCH_CUDA_SANITIZER=1 python example_error.py

 the following output is printed by CSAN:

 ::

     ============================
     CSAN detected a possible data race on tensor with data pointer 139719969079296
     Access by stream 94646435460352 during kernel:
     aten::mul.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
     writing to argument(s) self, out, and to the output
     With stack trace:
       File "example_error.py", line 6, in <module>
         torch.mul(a, 5, out=a)
       ...
       File "pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch
         stack_trace = traceback.StackSummary.extract(

     Previous access by stream 0 during kernel:
     aten::rand(int[] size, *, int? dtype=None, Device? device=None) -> Tensor
     writing to the output
     With stack trace:
       File "example_error.py", line 3, in <module>
         a = torch.rand(10000, device="cuda")
       ...
       File "pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch
         stack_trace = traceback.StackSummary.extract(

     Tensor was allocated with stack trace:
       File "example_error.py", line 3, in <module>
         a = torch.rand(10000, device="cuda")
       ...
       File "pytorch/torch/cuda/_sanitizer.py", line 420, in _handle_memory_allocation
         traceback.StackSummary.extract(

 This gives extensive insight into the origin of the error:

 - A tensor was incorrectly accessed from streams with ids: 0 (default stream) and 94646435460352 (new stream)
 - The tensor was allocated by invoking ``a = torch.rand(10000, device="cuda")``
 - The faulty accesses were caused by operators
     - ``a = torch.rand(10000, device="cuda")`` on stream 0
     - ``torch.mul(a, 5, out=a)`` on stream 94646435460352
 - The error message also displays the schemas of the invoked operators, along with a note
   showing which arguments of the operators correspond to the affected tensor.

   - In the example, it can be seen that tensor ``a`` corresponds to arguments ``self``, ``out``
     and the ``output`` value of the invoked operator ``torch.mul``.

 .. seealso::
     The list of supported torch operators and their schemas can be viewed
     :doc:`here <torch>`.

 The bug can be fixed by forcing the new stream to wait for the default stream:

 ::

     with torch.cuda.stream(torch.cuda.Stream()):
         torch.cuda.current_stream().wait_stream(torch.cuda.default_stream())
         torch.mul(a, 5, out=a)

 When the script is run again, there are no errors reported.

 API Reference
 -------------

 .. autofunction:: enable_cuda_sanitizer
	.. currentmodule:: torch.cuda._sanitizer

	CUDA Stream Sanitizer
	=====================

	.. note::
	This is a prototype feature, which means it is at an early stage
	for feedback and testing, and its components are subject to change.

	Overview
	--------

	.. automodule:: torch.cuda._sanitizer


	Usage
	------

	Here is an example of a simple synchronization error in PyTorch:

	::

	import torch

	a = torch.rand(4, 2, device="cuda")

	with torch.cuda.stream(torch.cuda.Stream()):
	torch.mul(a, 5, out=a)

	The ``a`` tensor is initialized on the default stream and, without any synchronization
	methods, modified on a new stream. The two kernels will run concurrently on the same tensor,
	which might cause the second kernel to read uninitialized data before the first one was able
	to write it, or the first kernel might overwrite part of the result of the second.
	When this script is run on the commandline with:
	::

	TORCH_CUDA_SANITIZER=1 python example_error.py

	the following output is printed by CSAN:

	::

	============================
	CSAN detected a possible data race on tensor with data pointer 139719969079296
	Access by stream 94646435460352 during kernel:
	aten::mul.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
	writing to argument(s) self, out, and to the output
	With stack trace:
	File "example_error.py", line 6, in <module>
	torch.mul(a, 5, out=a)
	...
	File "pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch
	stack_trace = traceback.StackSummary.extract(

	Previous access by stream 0 during kernel:
	aten::rand(int[] size, *, int? dtype=None, Device? device=None) -> Tensor
	writing to the output
	With stack trace:
	File "example_error.py", line 3, in <module>
	a = torch.rand(10000, device="cuda")
	...
	File "pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch
	stack_trace = traceback.StackSummary.extract(

	Tensor was allocated with stack trace:
	File "example_error.py", line 3, in <module>
	a = torch.rand(10000, device="cuda")
	...
	File "pytorch/torch/cuda/_sanitizer.py", line 420, in _handle_memory_allocation
	traceback.StackSummary.extract(

	This gives extensive insight into the origin of the error:

	- A tensor was incorrectly accessed from streams with ids: 0 (default stream) and 94646435460352 (new stream)
	- The tensor was allocated by invoking ``a = torch.rand(10000, device="cuda")``
	- The faulty accesses were caused by operators
	- ``a = torch.rand(10000, device="cuda")`` on stream 0
	- ``torch.mul(a, 5, out=a)`` on stream 94646435460352
	- The error message also displays the schemas of the invoked operators, along with a note
	showing which arguments of the operators correspond to the affected tensor.

	- In the example, it can be seen that tensor ``a`` corresponds to arguments ``self``, ``out``
	and the ``output`` value of the invoked operator ``torch.mul``.

	.. seealso::
	The list of supported torch operators and their schemas can be viewed
	:doc:`here <torch>`.

	The bug can be fixed by forcing the new stream to wait for the default stream:

	::

	with torch.cuda.stream(torch.cuda.Stream()):
	torch.cuda.current_stream().wait_stream(torch.cuda.default_stream())
	torch.mul(a, 5, out=a)

	When the script is run again, there are no errors reported.

	API Reference
	-------------

	.. autofunction:: enable_cuda_sanitizer