| torch.cuda |
| =================================== |
| .. automodule:: torch.cuda |
| .. currentmodule:: torch.cuda |
| |
| .. autosummary:: |
| :toctree: generated |
| :nosignatures: |
| |
| StreamContext |
| can_device_access_peer |
| current_blas_handle |
| current_device |
| current_stream |
| cudart |
| default_stream |
| device |
| device_count |
| device_of |
| get_arch_list |
| get_device_capability |
| get_device_name |
| get_device_properties |
| get_gencode_flags |
| get_sync_debug_mode |
| init |
| ipc_collect |
| is_available |
| is_initialized |
| memory_usage |
| set_device |
| set_stream |
| set_sync_debug_mode |
| stream |
| synchronize |
| utilization |
| temperature |
| power_draw |
| clock_rate |
| OutOfMemoryError |
| |
| Random Number Generator |
| ------------------------- |
| .. autosummary:: |
| :toctree: generated |
| :nosignatures: |
| |
| get_rng_state |
| get_rng_state_all |
| set_rng_state |
| set_rng_state_all |
| manual_seed |
| manual_seed_all |
| seed |
| seed_all |
| initial_seed |
| |
| |
| Communication collectives |
| ------------------------- |
| |
| .. autosummary:: |
| :toctree: generated |
| :nosignatures: |
| |
| comm.broadcast |
| comm.broadcast_coalesced |
| comm.reduce_add |
| comm.scatter |
| comm.gather |
| |
| Streams and events |
| ------------------ |
| .. autosummary:: |
| :toctree: generated |
| :nosignatures: |
| |
| Stream |
| ExternalStream |
| Event |
| |
| Graphs (beta) |
| ------------- |
| .. autosummary:: |
| :toctree: generated |
| :nosignatures: |
| |
| is_current_stream_capturing |
| graph_pool_handle |
| CUDAGraph |
| graph |
| make_graphed_callables |
| |
| .. _cuda-memory-management-api: |
| |
| Memory management |
| ----------------- |
| .. autosummary:: |
| :toctree: generated |
| :nosignatures: |
| |
| empty_cache |
| list_gpu_processes |
| mem_get_info |
| memory_stats |
| memory_summary |
| memory_snapshot |
| memory_allocated |
| max_memory_allocated |
| reset_max_memory_allocated |
| memory_reserved |
| max_memory_reserved |
| set_per_process_memory_fraction |
| memory_cached |
| max_memory_cached |
| reset_max_memory_cached |
| reset_peak_memory_stats |
| caching_allocator_alloc |
| caching_allocator_delete |
| get_allocator_backend |
| CUDAPluggableAllocator |
| change_current_allocator |
| MemPool |
| MemPoolContext |
| |
| .. autoclass:: torch.cuda.use_mem_pool |
| |
| .. FIXME The following doesn't seem to exist. Is it supposed to? |
| https://github.com/pytorch/pytorch/issues/27785 |
| .. autofunction:: reset_max_memory_reserved |
| |
| NVIDIA Tools Extension (NVTX) |
| ----------------------------- |
| |
| .. autosummary:: |
| :toctree: generated |
| :nosignatures: |
| |
| nvtx.mark |
| nvtx.range_push |
| nvtx.range_pop |
| nvtx.range |
| |
| Jiterator (beta) |
| ----------------------------- |
| .. autosummary:: |
| :toctree: generated |
| :nosignatures: |
| |
| jiterator._create_jit_fn |
| jiterator._create_multi_output_jit_fn |
| |
| TunableOp |
| --------- |
| |
| Some operations could be implemented using more than one library or more than |
| one technique. For example, a GEMM could be implemented for CUDA or ROCm using |
| either the cublas/cublasLt libraries or hipblas/hipblasLt libraries, |
| respectively. How does one know which implementation is the fastest and should |
| be chosen? That's what TunableOp provides. Certain operators have been |
| implemented using multiple strategies as Tunable Operators. At runtime, all |
| strategies are profiled and the fastest is selected for all subsequent |
| operations. |
| |
| See the :doc:`documentation <cuda.tunable>` for information on how to use it. |
| |
| .. toctree:: |
| :hidden: |
| |
| cuda.tunable |
| |
| |
| Stream Sanitizer (prototype) |
| ---------------------------- |
| |
| CUDA Sanitizer is a prototype tool for detecting synchronization errors between streams in PyTorch. |
| See the :doc:`documentation <cuda._sanitizer>` for information on how to use it. |
| |
| .. toctree:: |
| :hidden: |
| |
| cuda._sanitizer |
| |
| |
| .. This module needs to be documented. Adding here in the meantime |
| .. for tracking purposes |
| .. py:module:: torch.cuda.comm |
| .. py:module:: torch.cuda.error |
| .. py:module:: torch.cuda.gds |
| .. py:module:: torch.cuda.graphs |
| .. py:module:: torch.cuda.jiterator |
| .. py:module:: torch.cuda.memory |
| .. py:module:: torch.cuda.nccl |
| .. py:module:: torch.cuda.nvtx |
| .. py:module:: torch.cuda.profiler |
| .. py:module:: torch.cuda.random |
| .. py:module:: torch.cuda.sparse |
| .. py:module:: torch.cuda.streams |