Skip to content

Commit 4725d98

Browse files
arrdemaignas
andauthored
feat(pip_repository): Support pip parse cycles (#1166)
This patch reworks the `pip_repository` machinery to allow users to manually annotate groups of libraries which form packaging cycles in PyPi and must be simultaneously installed. The strategy here is to transform any dependencies `A` and `B` which have dependencies and are mutually dependent ```mermaid graph LR; A-->B; A-->D; A-->E; B-->A; B-->F; B-->G; ``` into a new "dependency group" `C` which has `A*` and `B*` as dependencies, defined as `A` and `B` less any direct dependencies which are members of the group. This is viable _for python_ because Python files just need to be emplaced into a runfiles directory for the interpreter. We don't actually have a true hard dependency between the build definition of `A` requiring the build product `B` be available which requires that the build product of `A` be available. ```mermaid graph LR C-->A*; A*-->D; A*-->E; C-->B*; B*-->F; B*-->G; ``` This gets us most of the way there, as a user can now safely write `requirement("A")` and we can provide them with `C`, which has the desired effect of pulling in `A`, `B` and their respective transitives. There is one remaining problem - a user writing `deps = [requirement("A"), requirement("B")]` will take a double direct dependency on `C`. So we need to insert a layer of indirection, generating `C_A` and `C_B` which serve only as unique aliases for `C` so that we can support the double dependency. Our final dependency graph then is as follows ```mermaid graph LR C_A-->C; C_B-->C; C-->A*; A*-->D; A*-->E; C-->B*; B*-->F; B*-->G; ``` Addresses #1076, #1188 ## To do - [x] Get rebased - [x] Get re-validated manually - [x] Buildifier - [x] Get CI happy - [x] Update documentation - [x] Update changelog --------- Co-authored-by: Ignas Anikevicius <240938+aignas@users.noreply.github.com>
1 parent 69abe80 commit 4725d98

37 files changed

Lines changed: 12534 additions & 124 deletions

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,11 @@ A brief description of the categories of changes:
5959
`data`. Note, that the `@pypi_foo//:pkg` labels are still present for
6060
backwards compatibility.
6161

62+
* (pip_parse) The parameter `requirement_cycles` may be provided a map of names
63+
to lists of requirements which form a dependency cycle. `pip_parse` will break
64+
the cycle for you transparently. This behavior is also available under bzlmod
65+
as `pip.parse(requirement_cycles={})`.
66+
6267
* (gazelle) The flag `use_pip_repository_aliases` is now set to `True` by
6368
default, which will cause `gazelle` to change third-party dependency labels
6469
from `@pip_foo//:pkg` to `@pip//foo` by default.

docs/sphinx/pypi-dependencies.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,86 @@ Any 'extras' specified in the requirements lock file will be automatically added
130130
as transitive dependencies of the package. In the example above, you'd just put
131131
`requirement("useful_dep")`.
132132

133+
### Packaging cycles
134+
135+
Sometimes PyPi packages contain dependency cycles -- for instance `sphinx`
136+
depends on `sphinxcontrib-serializinghtml`. When using them as `requirement()`s,
137+
ala
138+
139+
```
140+
py_binary(
141+
name = "doctool",
142+
...
143+
deps = [
144+
requirement("sphinx"),
145+
]
146+
)
147+
```
148+
149+
Bazel will protest because it doesn't support cycles in the build graph --
150+
151+
```
152+
ERROR: .../external/pypi_sphinxcontrib_serializinghtml/BUILD.bazel:44:6: in alias rule @pypi_sphinxcontrib_serializinghtml//:pkg: cycle in dependency graph:
153+
//:doctool (...)
154+
@pypi//sphinxcontrib_serializinghtml:pkg (...)
155+
.-> @pypi_sphinxcontrib_serializinghtml//:pkg (...)
156+
| @pypi_sphinxcontrib_serializinghtml//:_pkg (...)
157+
| @pypi_sphinx//:pkg (...)
158+
| @pypi_sphinx//:_pkg (...)
159+
`-- @pypi_sphinxcontrib_serializinghtml//:pkg (...)
160+
```
161+
162+
The `requirement_cycles` argument allows you to work around these issues by
163+
specifying groups of packages which form cycles. `pip_parse` will transparently
164+
fix the cycles for you and provide the cyclic dependencies simultaneously.
165+
166+
```
167+
pip_parse(
168+
...
169+
requirement_cycles = {
170+
"sphinx": [
171+
"sphinx",
172+
"sphinxcontrib-serializinghtml",
173+
]
174+
},
175+
)
176+
```
177+
178+
`pip_parse` supports fixing multiple cycles simultaneously, however cycles must
179+
be distinct. `apache-airflow` for instance has dependency cycles with a number
180+
of its optional dependencies, which means those optional dependencies must all
181+
be a part of the `airflow` cycle. For instance --
182+
183+
```
184+
pip_parse(
185+
...
186+
requirement_cycles = {
187+
"airflow": [
188+
"apache-airflow",
189+
"apache-airflow-providers-common-sql",
190+
"apache-airflow-providers-postgres",
191+
"apache-airflow-providers-sqlite",
192+
]
193+
}
194+
)
195+
```
196+
197+
Alternatively, one could resolve the cycle by removing one leg of it.
198+
199+
For example while `apache-airflow-providers-sqlite` is "baked into" the Airflow
200+
package, `apache-airflow-providers-postgres` is not and is an optional feature.
201+
Rather than listing `apache-airflow[postgres]` in your `requirements.txt` which
202+
would expose a cycle via the extra, one could either _manually_ depend on
203+
`apache-airflow` and `apache-airflow-providers-postgres` separately as
204+
requirements. Bazel rules which need only `apache-airflow` can take it as a
205+
dependency, and rules which explicitly want to mix in
206+
`apache-airflow-providers-postgres` now can.
207+
208+
Alternatively, one could use `rules_python`'s patching features to remove one
209+
leg of the dependency manually. For instance by making
210+
`apache-airflow-providers-postgres` not explicitly depend on `apache-airflow` or
211+
perhaps `apache-airflow-providers-common-sql`.
212+
133213
## Consuming Wheel Dists Directly
134214

135215
If you need to depend on the wheel dists themselves, for instance, to pass them

examples/build_file_generation/BUILD.bazel

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ py_library(
6565
deps = [
6666
"//random_number_generator",
6767
"@pip//flask",
68+
"@pip//sphinx",
6869
],
6970
)
7071

examples/build_file_generation/WORKSPACE

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,19 @@ load("@rules_python//python:pip.bzl", "pip_parse")
9494
# You can instead check this `requirements.bzl` file into your repo.
9595
pip_parse(
9696
name = "pip",
97+
98+
# Requirement groups allow Bazel to tolerate PyPi cycles by putting dependencies
99+
# which are known to form cycles into groups together.
100+
experimental_requirement_cycles = {
101+
"sphinx": [
102+
"sphinx",
103+
"sphinxcontrib-qthelp",
104+
"sphinxcontrib-htmlhelp",
105+
"sphinxcontrib-devhelp",
106+
"sphinxcontrib-applehelp",
107+
"sphinxcontrib-serializinghtml",
108+
],
109+
},
97110
# (Optional) You can provide a python_interpreter (path) or a python_interpreter_target (a Bazel target, that
98111
# acts as an executable). The latter can be anything that could be used as Python interpreter. E.g.:
99112
# 1. Python interpreter that you compile in the build file.

examples/build_file_generation/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
from flask import Flask, jsonify
1616
from random_number_generator import generate_random_number
17+
import sphinx # noqa
1718

1819
app = Flask(__name__)
1920

0 commit comments

Comments
 (0)