blob: 28e630c61d074319a548d6d221c2aa79fda50796 [file] [log] [blame] [view] [edit]
:::{default-domain} bzl
:::
# Using dependencies from PyPI
Using PyPI packages (aka "pip install") involves two main steps.
1. [Installing third party packages](#installing-third-party-packages)
2. [Using third party packages as dependencies](#using-third-party-packages)
{#installing-third-party-packages}
## Installing third party packages
### Using bzlmod
To add pip dependencies to your `MODULE.bazel` file, use the `pip.parse`
extension, and call it to create the central external repo and individual wheel
external repos. Include in the `MODULE.bazel` the toolchain extension as shown
in the first bzlmod example above.
```starlark
pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
pip.parse(
hub_name = "my_deps",
python_version = "3.11",
requirements_lock = "//:requirements_lock_3_11.txt",
)
use_repo(pip, "my_deps")
```
For more documentation, including how the rules can update/create a requirements
file, see the bzlmod examples under the {gh-path}`examples` folder or the documentation
for the {obj}`@rules_python//python/extensions:pip.bzl` extension.
```{note}
We are using a host-platform compatible toolchain by default to setup pip dependencies.
During the setup phase, we create some symlinks, which may be inefficient on Windows
by default. In that case use the following `.bazelrc` options to improve performance if
you have admin privileges:
startup --windows_enable_symlinks
This will enable symlinks on Windows and help with bootstrap performance of setting up the
hermetic host python interpreter on this platform. Linux and OSX users should see no
difference.
```
### Using a WORKSPACE file
To add pip dependencies to your `WORKSPACE`, load the `pip_parse` function and
call it to create the central external repo and individual wheel external repos.
```starlark
load("@rules_python//python:pip.bzl", "pip_parse")
# Create a central repo that knows about the dependencies needed from
# requirements_lock.txt.
pip_parse(
name = "my_deps",
requirements_lock = "//path/to:requirements_lock.txt",
)
# Load the starlark macro, which will define your dependencies.
load("@my_deps//:requirements.bzl", "install_deps")
# Call it to define repos for your requirements.
install_deps()
```
(vendoring-requirements)=
#### Vendoring the requirements.bzl file
In some cases you may not want to generate the requirements.bzl file as a repository rule
while Bazel is fetching dependencies. For example, if you produce a reusable Bazel module
such as a ruleset, you may want to include the requirements.bzl file rather than make your users
install the WORKSPACE setup to generate it.
See https://github.com/bazelbuild/rules_python/issues/608
This is the same workflow as Gazelle, which creates `go_repository` rules with
[`update-repos`](https://github.com/bazelbuild/bazel-gazelle#update-repos)
To do this, use the "write to source file" pattern documented in
https://blog.aspect.dev/bazel-can-write-to-the-source-folder
to put a copy of the generated requirements.bzl into your project.
Then load the requirements.bzl file directly rather than from the generated repository.
See the example in rules_python/examples/pip_parse_vendored.
(per-os-arch-requirements)=
### Requirements for a specific OS/Architecture
In some cases you may need to use different requirements files for different OS, Arch combinations. This is enabled via the `requirements_by_platform` attribute in `pip.parse` extension and the `pip_parse` repository rule. The keys of the dictionary are labels to the file and the values are a list of comma separated target (os, arch) tuples.
For example:
```starlark
# ...
requirements_by_platform = {
"requirements_linux_x86_64.txt": "linux_x86_64",
"requirements_osx.txt": "osx_*",
"requirements_linux_exotic.txt": "linux_exotic",
"requirements_some_platforms.txt": "linux_aarch64,windows_*",
},
# For the list of standard platforms that the rules_python has toolchains for, default to
# the following requirements file.
requirements_lock = "requirements_lock.txt",
```
In case of duplicate platforms, `rules_python` will raise an error as there has
to be unambiguous mapping of the requirement files to the (os, arch) tuples.
An alternative way is to use per-OS requirement attributes.
```starlark
# ...
requirements_windows = "requirements_windows.txt",
requirements_darwin = "requirements_darwin.txt",
# For the remaining platforms (which is basically only linux OS), use this file.
requirements_lock = "requirements_lock.txt",
)
```
### pip rules
Note that since `pip_parse` and `pip.parse` are executed at evaluation time,
Bazel has no information about the Python toolchain and cannot enforce that the
interpreter used to invoke `pip` matches the interpreter used to run
`py_binary` targets. By default, `pip_parse` uses the system command
`"python3"`. To override this, pass in the `python_interpreter` attribute or
`python_interpreter_target` attribute to `pip_parse`. The `pip.parse` `bzlmod` extension
by default uses the hermetic python toolchain for the host platform.
You can have multiple `pip_parse`s in the same workspace, or use the pip
extension multiple times when using bzlmod. This configuration will create
multiple external repos that have no relation to one another and may result in
downloading the same wheels numerous times.
As with any repository rule, if you would like to ensure that `pip_parse` is
re-executed to pick up a non-hermetic change to your environment (e.g., updating
your system `python` interpreter), you can force it to re-execute by running
`bazel sync --only [pip_parse name]`.
{#using-third-party-packages}
## Using third party packages as dependencies
Each extracted wheel repo contains a `py_library` target representing
the wheel's contents. There are two ways to access this library. The
first uses the `requirement()` function defined in the central
repo's `//:requirements.bzl` file. This function maps a pip package
name to a label:
```starlark
load("@my_deps//:requirements.bzl", "requirement")
py_library(
name = "mylib",
srcs = ["mylib.py"],
deps = [
":myotherlib",
requirement("some_pip_dep"),
requirement("another_pip_dep"),
]
)
```
The reason `requirement()` exists is to insulate from
changes to the underlying repository and label strings. However, those
labels have become directly used, so aren't able to easily change regardless.
On the other hand, using `requirement()` has several drawbacks; see
[this issue][requirements-drawbacks] for an enumeration. If you don't
want to use `requirement()`, you can use the library
labels directly instead. For `pip_parse`, the labels are of the following form:
```starlark
@{name}//{package}
```
Here `name` is the `name` attribute that was passed to `pip_parse` and
`package` is the pip package name with characters that are illegal in
Bazel label names (e.g. `-`, `.`) replaced with `_`. If you need to
update `name` from "old" to "new", then you can run the following
buildozer command:
```shell
buildozer 'substitute deps @old//([^/]+) @new//${1}' //...:*
```
[requirements-drawbacks]: https://github.com/bazelbuild/rules_python/issues/414
### Entry points
If you would like to access [entry points][whl_ep], see the `py_console_script_binary` rule documentation,
which can help you create a `py_binary` target for a particular console script exposed by a package.
[whl_ep]: https://packaging.python.org/specifications/entry-points/
### 'Extras' dependencies
Any 'extras' specified in the requirements lock file will be automatically added
as transitive dependencies of the package. In the example above, you'd just put
`requirement("useful_dep")` or `@pypi//useful_dep`.
### Consuming Wheel Dists Directly
If you need to depend on the wheel dists themselves, for instance, to pass them
to some other packaging tool, you can get a handle to them with the
`whl_requirement` macro. For example:
```starlark
load("@pypi//:requirements.bzl", "whl_requirement")
filegroup(
name = "whl_files",
data = [
# This is equivalent to "@pypi//boto3:whl"
whl_requirement("boto3"),
]
)
```
### Creating a filegroup of files within a whl
The rule {obj}`whl_filegroup` exists as an easy way to extract the necessary files
from a whl file without the need to modify the `BUILD.bazel` contents of the
whl repositories generated via `pip_repository`. Use it similarly to the `filegroup`
above. See the API docs for more information.
(advance-topics)=
## Advanced topics
(circular-deps)=
### Circular dependencies
Sometimes PyPi packages contain dependency cycles -- for instance a particular
version `sphinx` (this is no longer the case in the latest version as of
2024-06-02) depends on `sphinxcontrib-serializinghtml`. When using them as
`requirement()`s, ala
```
py_binary(
name = "doctool",
...
deps = [
requirement("sphinx"),
],
)
```
Bazel will protest because it doesn't support cycles in the build graph --
```
ERROR: .../external/pypi_sphinxcontrib_serializinghtml/BUILD.bazel:44:6: in alias rule @pypi_sphinxcontrib_serializinghtml//:pkg: cycle in dependency graph:
//:doctool (...)
@pypi//sphinxcontrib_serializinghtml:pkg (...)
.-> @pypi_sphinxcontrib_serializinghtml//:pkg (...)
| @pypi_sphinxcontrib_serializinghtml//:_pkg (...)
| @pypi_sphinx//:pkg (...)
| @pypi_sphinx//:_pkg (...)
`-- @pypi_sphinxcontrib_serializinghtml//:pkg (...)
```
The `experimental_requirement_cycles` argument allows you to work around these
issues by specifying groups of packages which form cycles. `pip_parse` will
transparently fix the cycles for you and provide the cyclic dependencies
simultaneously.
```starlark
pip_parse(
...
experimental_requirement_cycles = {
"sphinx": [
"sphinx",
"sphinxcontrib-serializinghtml",
]
},
)
```
`pip_parse` supports fixing multiple cycles simultaneously, however cycles must
be distinct. `apache-airflow` for instance has dependency cycles with a number
of its optional dependencies, which means those optional dependencies must all
be a part of the `airflow` cycle. For instance --
```starlark
pip_parse(
...
experimental_requirement_cycles = {
"airflow": [
"apache-airflow",
"apache-airflow-providers-common-sql",
"apache-airflow-providers-postgres",
"apache-airflow-providers-sqlite",
]
}
)
```
Alternatively, one could resolve the cycle by removing one leg of it.
For example while `apache-airflow-providers-sqlite` is "baked into" the Airflow
package, `apache-airflow-providers-postgres` is not and is an optional feature.
Rather than listing `apache-airflow[postgres]` in your `requirements.txt` which
would expose a cycle via the extra, one could either _manually_ depend on
`apache-airflow` and `apache-airflow-providers-postgres` separately as
requirements. Bazel rules which need only `apache-airflow` can take it as a
dependency, and rules which explicitly want to mix in
`apache-airflow-providers-postgres` now can.
Alternatively, one could use `rules_python`'s patching features to remove one
leg of the dependency manually. For instance by making
`apache-airflow-providers-postgres` not explicitly depend on `apache-airflow` or
perhaps `apache-airflow-providers-common-sql`.
(bazel-downloader)=
### Multi-platform support
Multi-platform support of cross-building the wheels can be done in two ways - either
using {bzl:attr}`experimental_index_url` for the {bzl:obj}`pip.parse` bzlmod tag class
or by using the {bzl:attr}`pip.parse.download_only` setting. In this section we
are going to outline quickly how one can use the latter option.
Let's say you have 2 requirements files:
```
# requirements.linux_x86_64.txt
--platform=manylinux_2_17_x86_64
--python-version=39
--implementation=cp
--abi=cp39
foo==0.0.1 --hash=sha256:deadbeef
bar==0.0.1 --hash=sha256:deadb00f
```
```
# requirements.osx_aarch64.txt contents
--platform=macosx_10_9_arm64
--python-version=39
--implementation=cp
--abi=cp39
foo==0.0.3 --hash=sha256:deadbaaf
```
With these 2 files your {bzl:obj}`pip.parse` could look like:
```
pip.parse(
hub_name = "pip",
python_version = "3.9",
# Tell `pip` to ignore sdists
download_only = True,
requirements_by_platform = {
"requirements.linux_x86_64.txt": "linux_x86_64",
"requirements.osx_aarch64.txt": "osx_aarch64",
},
)
```
With this, the `pip.parse` will create a hub repository that is going to
support only two platforms - `cp39_osx_aarch64` and `cp39_linux_x86_64` and it
will only use `wheels` and ignore any sdists that it may find on the PyPI
compatible indexes.
```{note}
This is only supported on `bzlmd`.
```
(bazel-downloader)=
### Bazel downloader and multi-platform wheel hub repository.
The `bzlmod` `pip.parse` call supports pulling information from `PyPI` (or a
compatible mirror) and it will ensure that the [bazel
downloader][bazel_downloader] is used for downloading the wheels. This allows
the users to use the [credential helper](#credential-helper) to authenticate
with the mirror and it also ensures that the distribution downloads are cached.
It also avoids using `pip` altogether and results in much faster dependency
fetching.
This can be enabled by `experimental_index_url` and related flags as shown in
the {gh-path}`examples/bzlmod/MODULE.bazel` example.
When using this feature during the `pip` extension evaluation you will see the accessed indexes similar to below:
```console
Loading: 0 packages loaded
currently loading: docs/
Fetching module extension pip in @@//python/extensions:pip.bzl; starting
Fetching https://pypi.org/simple/twine/
```
This does not mean that `rules_python` is fetching the wheels eagerly, but it
rather means that it is calling the PyPI server to get the Simple API response
to get the list of all available source and wheel distributions. Once it has
got all of the available distributions, it will select the right ones depending
on the `sha256` values in your `requirements_lock.txt` file. The compatible
distribution URLs will be then written to the `MODULE.bazel.lock` file. Currently
users wishing to use the lock file with `rules_python` with this feature have
to set an environment variable `RULES_PYTHON_OS_ARCH_LOCK_FILE=0` which will
become default in the next release.
Fetching the distribution information from the PyPI allows `rules_python` to
know which `whl` should be used on which target platform and it will determine
that by parsing the `whl` filename based on [PEP600], [PEP656] standards. This
allows the user to configure the behaviour by using the following publicly
available flags:
* {obj}`--@rules_python//python/config_settings:py_linux_libc` for selecting the Linux libc variant.
* {obj}`--@rules_python//python/config_settings:pip_whl` for selecting `whl` distribution preference.
* {obj}`--@rules_python//python/config_settings:pip_whl_osx_arch` for selecting MacOS wheel preference.
* {obj}`--@rules_python//python/config_settings:pip_whl_glibc_version` for selecting the GLIBC version compatibility.
* {obj}`--@rules_python//python/config_settings:pip_whl_muslc_version` for selecting the musl version compatibility.
* {obj}`--@rules_python//python/config_settings:pip_whl_osx_version` for selecting MacOS version compatibility.
[bazel_downloader]: https://bazel.build/rules/lib/builtins/repository_ctx#download
[pep600]: https://peps.python.org/pep-0600/
[pep656]: https://peps.python.org/pep-0656/
(credential-helper)=
### Credential Helper
The "use Bazel downloader for python wheels" experimental feature includes support for the Bazel
[Credential Helper][cred-helper-design].
Your python artifact registry may provide a credential helper for you. Refer to your index's docs
to see if one is provided.
See the [Credential Helper Spec][cred-helper-spec] for details.
[cred-helper-design]: https://github.com/bazelbuild/proposals/blob/main/designs/2022-06-07-bazel-credential-helpers.md
[cred-helper-spec]: https://github.com/EngFlow/credential-helper-spec/blob/main/spec.md
#### Basic Example:
The simplest form of a credential helper is a bash script that accepts an arg and spits out JSON to
stdout. For a service like Google Artifact Registry that uses ['Basic' HTTP Auth][rfc7617] and does
not provide a credential helper that conforms to the [spec][cred-helper-spec], the script might
look like:
```bash
#!/bin/bash
# cred_helper.sh
ARG=$1 # but we don't do anything with it as it's always "get"
# formatting is optional
echo '{'
echo ' "headers": {'
echo ' "Authorization": ["Basic dGVzdDoxMjPCow=="]'
echo ' }'
echo '}'
```
Configure Bazel to use this credential helper for your python index `example.com`:
```
# .bazelrc
build --credential_helper=example.com=/full/path/to/cred_helper.sh
```
Bazel will call this file like `cred_helper.sh get` and use the returned JSON to inject headers
into whatever HTTP(S) request it performs against `example.com`.
[rfc7617]: https://datatracker.ietf.org/doc/html/rfc7617