docs/source/notes/get_start_xpu.rst - platform/external/pytorch - Git at Google

 Pytorch 2.4: Getting Started on Intel GPU
 =========================================

 The support for Intel GPUs is released alongside PyTorch v2.4.

 This release only supports build from source for Intel GPUs.

 Hardware Prerequisites
 ----------------------

 .. list-table::
    :header-rows: 1

    * - Supported Hardware
      - Intel® Data Center GPU Max Series
    * - Supported OS
      - Linux


 PyTorch for Intel GPUs is compatible with Intel® Data Center GPU Max Series and only supports OS Linux with release 2.4.

 Software Prerequisites
 ----------------------

 As a prerequisite, install the driver and required packages by following the `PyTorch Installation Prerequisites for Intel GPUs <https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html>`_.

 Set up Environment
 ------------------

 Before you begin, you need to set up the environment. This can be done by sourcing the ``setvars.sh`` script provided by the ``intel-for-pytorch-gpu-dev`` and  ``intel-pti-dev`` packages.

 .. code-block::

    source ${ONEAPI_ROOT}/setvars.sh

 .. note::
    The ``ONEAPI_ROOT`` is the folder you installed your ``intel-for-pytorch-gpu-dev`` and  ``intel-pti-dev`` packages. Typically, it is located at ``/opt/intel/oneapi/`` or ``~/intel/oneapi/``.

 Build from source
 -----------------

 Now we have all the required packages installed and environment acitvated. Use the following commands to install ``pytorch``, ``torchvision``, ``torchaudio`` by building from source. For more details, refer to official guides in `PyTorch from source <https://github.com/pytorch/pytorch?tab=readme-ov-file#intel-gpu-support>`_, `Vision from source <https://github.com/pytorch/vision/blob/main/CONTRIBUTING.md#development-installation>`_ and `Audio from source <https://pytorch.org/audio/main/build.linux.html>`_.

 .. code-block::

    # Get PyTorch Source Code
    git clone --recursive https://github.com/pytorch/pytorch
    cd pytorch
    git checkout main # or checkout the specific release version >= v2.4
    git submodule sync
    git submodule update --init --recursive

    # Get required packages for compilation
    conda install cmake ninja
    pip install -r requirements.txt

    # Pytorch for Intel GPUs only support Linux platform for now.
    # Install the required packages for pytorch compilation.
    conda install intel::mkl-static intel::mkl-include

    # (optional) If using torch.compile with inductor/triton, install the matching version of triton
    # Run from the pytorch directory after cloning
    # For Intel GPU support, please explicitly `export USE_XPU=1` before running command.
    USE_XPU=1 make triton

    # If you would like to compile PyTorch with new C++ ABI enabled, then first run this command:
    export _GLIBCXX_USE_CXX11_ABI=1

    # pytorch build from source
    export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
    python setup.py develop
    cd ..

    # (optional) If using torchvison.
    # Get torchvision Code
    git clone https://github.com/pytorch/vision.git
    cd vision
    git checkout main # or specific version
    python setup.py develop
    cd ..

    # (optional) If using torchaudio.
    # Get torchaudio Code
    git clone https://github.com/pytorch/audio.git
    cd audio
    pip install -r requirements.txt
    git checkout main # or specific version
    git submodule sync
    git submodule update --init --recursive
    python setup.py develop
    cd ..

 Check availability for Intel GPU
 --------------------------------

 .. note::
    Make sure the environment is properly set up by following `Environment Set up <#set-up-environment>`_ before running the code.

 To check if your Intel GPU is available, you would typically use the following code:

 .. code-block::

    import torch
    torch.xpu.is_available()  # torch.xpu is the API for Intel GPU support

 If the output is ``False``, ensure that you have Intel GPU in your system and correctly follow the `PyTorch Installation Prerequisites for Intel GPUs <https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html>`_. Then, check that the PyTorch compilation is correctly finished.

 Minimum Code Change
 -------------------

 If you are migrating code from ``cuda``, you would change references from ``cuda`` to ``xpu``. For example:

 .. code-block::

    # CUDA CODE
    tensor = torch.tensor([1.0, 2.0]).to("cuda")

    # CODE for Intel GPU
    tensor = torch.tensor([1.0, 2.0]).to("xpu")

 The following points outline the support and limitations for PyTorch with Intel GPU:

 #. Both training and inference workflows are supported.
 #. Both eager mode and ``torch.compile`` is supported.
 #. Data types such as FP32, BF16, FP16, and Automatic Mixed Precision (AMP) are all supported.
 #. Models that depend on third-party components, will not be supported until PyTorch v2.5 or later.

 Examples
 --------

 This section contains usage examples for both inference and training workflows.

 Inference Examples
 ^^^^^^^^^^^^^^^^^^

 Here is a few inference workflow examples.


 Inference with FP32
 """""""""""""""""""

 .. code-block::

    import torch
    import torchvision.models as models

    model = models.resnet50(weights="ResNet50_Weights.DEFAULT")
    model.eval()
    data = torch.rand(1, 3, 224, 224)

    ######## code changes #######
    model = model.to("xpu")
    data = data.to("xpu")
    ######## code changes #######

    with torch.no_grad():
        model(data)

    print("Execution finished")

 Inference with AMP
 """"""""""""""""""

 .. code-block::

    import torch
    import torchvision.models as models

    model = models.resnet50(weights="ResNet50_Weights.DEFAULT")
    model.eval()
    data = torch.rand(1, 3, 224, 224)

    #################### code changes #################
    model = model.to("xpu")
    data = data.to("xpu")
    #################### code changes #################

    with torch.no_grad():
        d = torch.rand(1, 3, 224, 224)
        ############################# code changes #####################
        d = d.to("xpu")
        # set dtype=torch.bfloat16 for BF16
        with torch.autocast(device_type="xpu", dtype=torch.float16, enabled=True):
        ############################# code changes #####################
            model(data)

    print("Execution finished")

 Inference with ``torch.compile``
 """"""""""""""""""""""""""""""""

 .. code-block::

    import torch
    import torchvision.models as models

    model = models.resnet50(weights="ResNet50_Weights.DEFAULT")
    model.eval()
    data = torch.rand(1, 3, 224, 224)
    ITERS = 10

    ######## code changes #######
    model = model.to("xpu")
    data = data.to("xpu")
    ######## code changes #######

    model = torch.compile(model)
    for i in range(ITERS):
        with torch.no_grad():
            model(data)

    print("Execution finished")

 Training Examples
 ^^^^^^^^^^^^^^^^^

 Here is a few training workflow examples.

 Train with FP32
 """""""""""""""

 .. code-block::

    import torch
    import torchvision

    LR = 0.001
    DOWNLOAD = True
    DATA = "datasets/cifar10/"

    transform = torchvision.transforms.Compose(
        [
            torchvision.transforms.Resize((224, 224)),
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
        ]
    )
    train_dataset = torchvision.datasets.CIFAR10(
        root=DATA,
        train=True,
        transform=transform,
        download=DOWNLOAD,
    )
    train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=128)

    model = torchvision.models.resnet50()
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=LR, momentum=0.9)
    model.train()
    ######################## code changes #######################
    model = model.to("xpu")
    criterion = criterion.to("xpu")
    ######################## code changes #######################

    for batch_idx, (data, target) in enumerate(train_loader):
        ########## code changes ##########
        data = data.to("xpu")
        target = target.to("xpu")
        ########## code changes ##########
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        print(batch_idx)
    torch.save(
        {
            "model_state_dict": model.state_dict(),
            "optimizer_state_dict": optimizer.state_dict(),
        },
        "checkpoint.pth",
    )

    print("Execution finished")

 Train with AMP
 """"""""""""""

 .. code-block::

    import torch
    import torchvision

    LR = 0.001
    DOWNLOAD = True
    DATA = "datasets/cifar10/"

    use_amp=True

    transform = torchvision.transforms.Compose(
        [
            torchvision.transforms.Resize((224, 224)),
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
        ]
    )
    train_dataset = torchvision.datasets.CIFAR10(
        root=DATA,
        train=True,
        transform=transform,
        download=DOWNLOAD,
    )
    train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=128)

    model = torchvision.models.resnet50()
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=LR, momentum=0.9)
    scaler = torch.amp.GradScaler(enabled=use_amp)

    model.train()
    ######################## code changes #######################
    model = model.to("xpu")
    criterion = criterion.to("xpu")
    ######################## code changes #######################

    for batch_idx, (data, target) in enumerate(train_loader):
        ########## code changes ##########
        data = data.to("xpu")
        target = target.to("xpu")
        ########## code changes ##########
        # set dtype=torch.bfloat16 for BF16
        with torch.autocast(device_type="xpu", dtype=torch.float16, enabled=use_amp):
            output = model(data)
            loss = criterion(output, target)
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        optimizer.zero_grad()
        print(batch_idx)

    torch.save(
        {
            "model_state_dict": model.state_dict(),
            "optimizer_state_dict": optimizer.state_dict(),
        },
        "checkpoint.pth",
    )

    print("Execution finished")
	Pytorch 2.4: Getting Started on Intel GPU
	=========================================

	The support for Intel GPUs is released alongside PyTorch v2.4.

	This release only supports build from source for Intel GPUs.

	Hardware Prerequisites
	----------------------

	.. list-table::
	:header-rows: 1

	* - Supported Hardware
	- Intel® Data Center GPU Max Series
	* - Supported OS
	- Linux


	PyTorch for Intel GPUs is compatible with Intel® Data Center GPU Max Series and only supports OS Linux with release 2.4.

	Software Prerequisites
	----------------------

	As a prerequisite, install the driver and required packages by following the `PyTorch Installation Prerequisites for Intel GPUs <https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html>`_.

	Set up Environment
	------------------

	Before you begin, you need to set up the environment. This can be done by sourcing the ``setvars.sh`` script provided by the ``intel-for-pytorch-gpu-dev`` and ``intel-pti-dev`` packages.

	.. code-block::

	source ${ONEAPI_ROOT}/setvars.sh

	.. note::
	The ``ONEAPI_ROOT`` is the folder you installed your ``intel-for-pytorch-gpu-dev`` and ``intel-pti-dev`` packages. Typically, it is located at ``/opt/intel/oneapi/`` or ``~/intel/oneapi/``.

	Build from source
	-----------------

	Now we have all the required packages installed and environment acitvated. Use the following commands to install ``pytorch``, ``torchvision``, ``torchaudio`` by building from source. For more details, refer to official guides in `PyTorch from source <https://github.com/pytorch/pytorch?tab=readme-ov-file#intel-gpu-support>`_, `Vision from source <https://github.com/pytorch/vision/blob/main/CONTRIBUTING.md#development-installation>`_ and `Audio from source <https://pytorch.org/audio/main/build.linux.html>`_.

	.. code-block::

	# Get PyTorch Source Code
	git clone --recursive https://github.com/pytorch/pytorch
	cd pytorch
	git checkout main # or checkout the specific release version >= v2.4
	git submodule sync
	git submodule update --init --recursive

	# Get required packages for compilation
	conda install cmake ninja
	pip install -r requirements.txt

	# Pytorch for Intel GPUs only support Linux platform for now.
	# Install the required packages for pytorch compilation.
	conda install intel::mkl-static intel::mkl-include

	# (optional) If using torch.compile with inductor/triton, install the matching version of triton
	# Run from the pytorch directory after cloning
	# For Intel GPU support, please explicitly `export USE_XPU=1` before running command.
	USE_XPU=1 make triton

	# If you would like to compile PyTorch with new C++ ABI enabled, then first run this command:
	export _GLIBCXX_USE_CXX11_ABI=1

	# pytorch build from source
	export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
	python setup.py develop
	cd ..

	# (optional) If using torchvison.
	# Get torchvision Code
	git clone https://github.com/pytorch/vision.git
	cd vision
	git checkout main # or specific version
	python setup.py develop
	cd ..

	# (optional) If using torchaudio.
	# Get torchaudio Code
	git clone https://github.com/pytorch/audio.git
	cd audio
	pip install -r requirements.txt
	git checkout main # or specific version
	git submodule sync
	git submodule update --init --recursive
	python setup.py develop
	cd ..

	Check availability for Intel GPU
	--------------------------------

	.. note::
	Make sure the environment is properly set up by following `Environment Set up <#set-up-environment>`_ before running the code.

	To check if your Intel GPU is available, you would typically use the following code:

	.. code-block::

	import torch
	torch.xpu.is_available() # torch.xpu is the API for Intel GPU support

	If the output is ``False``, ensure that you have Intel GPU in your system and correctly follow the `PyTorch Installation Prerequisites for Intel GPUs <https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html>`_. Then, check that the PyTorch compilation is correctly finished.

	Minimum Code Change
	-------------------

	If you are migrating code from ``cuda``, you would change references from ``cuda`` to ``xpu``. For example:

	.. code-block::

	# CUDA CODE
	tensor = torch.tensor([1.0, 2.0]).to("cuda")

	# CODE for Intel GPU
	tensor = torch.tensor([1.0, 2.0]).to("xpu")

	The following points outline the support and limitations for PyTorch with Intel GPU:

	#. Both training and inference workflows are supported.
	#. Both eager mode and ``torch.compile`` is supported.
	#. Data types such as FP32, BF16, FP16, and Automatic Mixed Precision (AMP) are all supported.
	#. Models that depend on third-party components, will not be supported until PyTorch v2.5 or later.

	Examples
	--------

	This section contains usage examples for both inference and training workflows.

	Inference Examples
	^^^^^^^^^^^^^^^^^^

	Here is a few inference workflow examples.


	Inference with FP32
	"""""""""""""""""""

	.. code-block::

	import torch
	import torchvision.models as models

	model = models.resnet50(weights="ResNet50_Weights.DEFAULT")
	model.eval()
	data = torch.rand(1, 3, 224, 224)

	######## code changes #######
	model = model.to("xpu")
	data = data.to("xpu")
	######## code changes #######

	with torch.no_grad():
	model(data)

	print("Execution finished")

	Inference with AMP
	""""""""""""""""""

	.. code-block::

	import torch
	import torchvision.models as models

	model = models.resnet50(weights="ResNet50_Weights.DEFAULT")
	model.eval()
	data = torch.rand(1, 3, 224, 224)

	#################### code changes #################
	model = model.to("xpu")
	data = data.to("xpu")
	#################### code changes #################

	with torch.no_grad():
	d = torch.rand(1, 3, 224, 224)
	############################# code changes #####################
	d = d.to("xpu")
	# set dtype=torch.bfloat16 for BF16
	with torch.autocast(device_type="xpu", dtype=torch.float16, enabled=True):
	############################# code changes #####################
	model(data)

	print("Execution finished")

	Inference with ``torch.compile``
	""""""""""""""""""""""""""""""""

	.. code-block::

	import torch
	import torchvision.models as models

	model = models.resnet50(weights="ResNet50_Weights.DEFAULT")
	model.eval()
	data = torch.rand(1, 3, 224, 224)
	ITERS = 10

	######## code changes #######
	model = model.to("xpu")
	data = data.to("xpu")
	######## code changes #######

	model = torch.compile(model)
	for i in range(ITERS):
	with torch.no_grad():
	model(data)

	print("Execution finished")

	Training Examples
	^^^^^^^^^^^^^^^^^

	Here is a few training workflow examples.

	Train with FP32
	"""""""""""""""

	.. code-block::

	import torch
	import torchvision

	LR = 0.001
	DOWNLOAD = True
	DATA = "datasets/cifar10/"

	transform = torchvision.transforms.Compose(
	[
	torchvision.transforms.Resize((224, 224)),
	torchvision.transforms.ToTensor(),
	torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
	]
	)
	train_dataset = torchvision.datasets.CIFAR10(
	root=DATA,
	train=True,
	transform=transform,
	download=DOWNLOAD,
	)
	train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=128)

	model = torchvision.models.resnet50()
	criterion = torch.nn.CrossEntropyLoss()
	optimizer = torch.optim.SGD(model.parameters(), lr=LR, momentum=0.9)
	model.train()
	######################## code changes #######################
	model = model.to("xpu")
	criterion = criterion.to("xpu")
	######################## code changes #######################

	for batch_idx, (data, target) in enumerate(train_loader):
	########## code changes ##########
	data = data.to("xpu")
	target = target.to("xpu")
	########## code changes ##########
	optimizer.zero_grad()
	output = model(data)
	loss = criterion(output, target)
	loss.backward()
	optimizer.step()
	print(batch_idx)
	torch.save(
	{
	"model_state_dict": model.state_dict(),
	"optimizer_state_dict": optimizer.state_dict(),
	},
	"checkpoint.pth",
	)

	print("Execution finished")

	Train with AMP
	""""""""""""""

	.. code-block::

	import torch
	import torchvision

	LR = 0.001
	DOWNLOAD = True
	DATA = "datasets/cifar10/"

	use_amp=True

	transform = torchvision.transforms.Compose(
	[
	torchvision.transforms.Resize((224, 224)),
	torchvision.transforms.ToTensor(),
	torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
	]
	)
	train_dataset = torchvision.datasets.CIFAR10(
	root=DATA,
	train=True,
	transform=transform,
	download=DOWNLOAD,
	)
	train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=128)

	model = torchvision.models.resnet50()
	criterion = torch.nn.CrossEntropyLoss()
	optimizer = torch.optim.SGD(model.parameters(), lr=LR, momentum=0.9)
	scaler = torch.amp.GradScaler(enabled=use_amp)

	model.train()
	######################## code changes #######################
	model = model.to("xpu")
	criterion = criterion.to("xpu")
	######################## code changes #######################

	for batch_idx, (data, target) in enumerate(train_loader):
	########## code changes ##########
	data = data.to("xpu")
	target = target.to("xpu")
	########## code changes ##########
	# set dtype=torch.bfloat16 for BF16
	with torch.autocast(device_type="xpu", dtype=torch.float16, enabled=use_amp):
	output = model(data)
	loss = criterion(output, target)
	scaler.scale(loss).backward()
	scaler.step(optimizer)
	scaler.update()
	optimizer.zero_grad()
	print(batch_idx)

	torch.save(
	{
	"model_state_dict": model.state_dict(),
	"optimizer_state_dict": optimizer.state_dict(),
	},
	"checkpoint.pth",
	)

	print("Execution finished")