| Torch Distributed Elastic |
| ============================ |
| |
| Makes distributed PyTorch fault-tolerant and elastic. |
| |
| Get Started |
| --------------- |
| .. toctree:: |
| :maxdepth: 1 |
| :caption: Usage |
| |
| elastic/quickstart |
| elastic/train_script |
| elastic/examples |
| |
| Documentation |
| --------------- |
| |
| .. toctree:: |
| :maxdepth: 1 |
| :caption: API |
| |
| elastic/run |
| elastic/agent |
| elastic/multiprocessing |
| elastic/errors |
| elastic/rendezvous |
| elastic/timer |
| elastic/metrics |
| elastic/events |
| elastic/subprocess_handler |
| elastic/control_plane |
| |
| .. toctree:: |
| :maxdepth: 1 |
| :caption: Advanced |
| |
| elastic/customization |
| |
| .. toctree:: |
| :maxdepth: 1 |
| :caption: Plugins |
| |
| elastic/kubernetes |