blob: f6e76d98e7b8cd79480eb7f2c472ad5965d01e1a [file] [log] [blame]
.. role:: hidden
:class: hidden-section
Tensor Parallelism - torch.distributed.tensor.parallel
======================================================
Tensor Parallelism(TP) is built on top of the PyTorch DistributedTensor
(`DTensor <https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md>`__)
and provides different parallelism styles: Colwise and Rowwise Parallelism.
.. warning ::
Tensor Parallelism APIs are experimental and subject to change.
The entrypoint to parallelize your ``nn.Module`` using Tensor Parallelism is:
.. automodule:: torch.distributed.tensor.parallel
.. currentmodule:: torch.distributed.tensor.parallel
.. autofunction:: parallelize_module
Tensor Parallelism supports the following parallel styles:
.. autoclass:: torch.distributed.tensor.parallel.ColwiseParallel
:members:
:undoc-members:
.. autoclass:: torch.distributed.tensor.parallel.RowwiseParallel
:members:
:undoc-members:
To simply configure the nn.Module's inputs and outputs with DTensor layouts
and perform necessary layout redistributions, without distribute the module
parameters to DTensors, the following classes can be used in
the ``parallelize_plan`` of ``parallelize_module``:
.. autoclass:: torch.distributed.tensor.parallel.PrepareModuleInput
:members:
:undoc-members:
.. autoclass:: torch.distributed.tensor.parallel.PrepareModuleOutput
:members:
:undoc-members:
For models like Transformer, we recommend users to use ``ColwiseParallel``
and ``RowwiseParallel`` together in the parallelize_plan for achieve the desired
sharding for the entire model (i.e. Attention and MLP).