pants: Interpreter constraints do not play well with lock files.

Say I have interpreter constraints “CPython>=3.6,<4” in-play and I want to generate a lockfile for a resolve. There needs to be a lockfile per-interpreter these constraints select since requirements can have environment markers and these can cause an interpreter-specific resolve. Since environment markers are so broad in scope, patch version of interpreters can have different resolves and even the same patch version and same platform can have a different resolve due to fields like platform_version and platform_release: https://www.python.org/dev/peps/pep-0508/#environment-markers

Put more simply, clearly if I have a concrete interpreter in-hand I can run a resolve for it and then generate a lock file for it. If I don’t, I can’t *.

With interpreter constraints I may only have a subset of possible interpreters though and so I can only generate a subset of the needed lockfiles. In the leading example, say I try to create the lockfile on a machine with just CPython 3.6.5 on June 13th 2021. I will generate a lockfile for that interpreter and check it in. Now, say, 2 months later I go back to that commit to re-build things, but on a machine with only CPython 3.9.1. I have no lockfile, and so it will need to be regenerated. The problem here is that the lockfile may not be close at all to the one generated 2 months ago. Many new versions of distributions may have been published and this can result in the new CPython 3.9.1 lockfile behaving quite differently than the CPython 3.6.5 lock file. Towards the worst end of this, the new behavior could be buggy or broken.

  • Technically there may be a way to perform a much too large resolve that ignores environment markers and some wheel tags to collect all possible distributions needed for all possible interpreters in an IC range.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 19 (19 by maintainers)

Commits related to this issue

Most upvoted comments

This example I have laying around only has 1 lock, but I think you get the idea:

$ cat lock.json | jq .
{
  "pex_version": "2.1.42",
  "requirements": [
    "requests"
  ],
  "resolves": {
    "manylinux_2_33_x86_64-cp-39-cp39": [
      {
        "name": "certifi",
        "sha256": "50b1e4f8446b06f41be7dd6338db18e0990601dce795c2b1686458aa7e8fa7d8",
        "source_artifact": null,
        "version": "2021.5.30"
      },
      {
        "name": "chardet",
        "sha256": "f864054d66fd9118f2e67044ac8981a54775ec5b67aed0441892edb553d21da5",
        "source_artifact": null,
        "version": "4.0.0"
      },
      {
        "name": "idna",
        "sha256": "b97d804b1e9b523befed77c48dacec60e6dcb0b5391d57af6a65a312a90648c0",
        "source_artifact": null,
        "version": "2.10"
      },
      {
        "name": "requests",
        "sha256": "c210084e36a42ae6b9219e00e48287def368a26d03a048ddad7bfee44f75871e",
        "source_artifact": null,
        "version": "2.25.1"
      },
      {
        "name": "urllib3",
        "sha256": "753a0374df26658f99d826cfe40394a686d05985786d946fbe4165b5148f5a7c",
        "source_artifact": null,
        "version": "1.26.5"
      }
    ]
  }
}

You shouldn’t be. Pex already resolves for every discovered interpreter on a machine that fits an IC range today (in parallel). We should be able to hit exactly that perf profile here too.

Not quite: lint/test use https://github.com/pantsbuild/pants/blob/b82a01ed0bffb5df09a877528efff4c44a6206a8/src/python/pants/backend/python/util_rules/pex.py#L125-L128, which has Pants choose a single interpreter to use in order to bypass that fanout.

Agreed on the rest.

I’m definitely sketched out by this from a performance perspective.

You shouldn’t be. Pex already resolves for every discovered interpreter on a machine that fits an IC range today (in parallel). We should be able to hit exactly that perf profile here too.

…it still seems like everything points to encouraging users to narrow their ranges significantly.

Maybe, but I think there is no way to avoid the fact that there simply will be use cases that use large ranges. We cannot tell those folks “don’t do that”.

Maybe the “cover the range” approach is safer, because rather than warning for the purposes of safety, we can warn for the purposes of performance (“[warn] Hey, did you know that >=3.7,❤️.10 will result in 3 different resolves/lockfiles? Consider using a narrower range, or set --resolve-width-threshold=4 to silence this warning.”)

I really think we should do post-resolve processing of dist-info/METADATA and use Requires-Python and Requires-Dist metadata to exactly determine the breadth of validity of a lock. I derailed things a bit with the “cover the range” terminology. Its actually about covering the environment range as selected by environment markers. Its just that the most common environment marker to use only picks out python minor versions (python_version). Whether we warn or fail can be debated, but certainly that can be an Pants option.

I suspect we’ll need to solve this sooner.

Yes. Alot of design effort has been focused on UX - there are actual thorny fundamental does it even work issues to sort out though before even getting to that. A swing in focus is needed to make sure we ship something that at base works.

Using #12312 for illustration on Pants itself, I want to highlight what your declaration of equivalency actually means in practice for a - I assume we agree - “reasonable” IC range - Pants’ itself:

$ ./pants --python-setup-interpreter-constraints="['==3.7.*']" lock src:: tests::
08:41:02.59 [INFO] Completed: Building pip_compile.pex with 1 requirement: pip-tools==6.2.0
08:41:11.17 [INFO] Completed: Generate lockfile for 19 requirementses: PyYAML<5.5,>=5.4, ansicolors==1.1.8, fasteners==0.16, freezegun==1.1.0, humbug==0.2.6, packaging==20.9, pex==2.1.42, psutil==5.8.0, pytest<6.3,>=6.0.1, request... (222 characters truncated)
08:41:11.17 [INFO] Wrote lockfile to 3rdparty/python/lockfile.txt
$ mv 3rdparty/python/lockfile.txt 3rdparty/python/lockfile.txt.37
$ ./pants --python-setup-interpreter-constraints="['==3.8.*']" lock src:: tests::
08:41:46.13 [INFO] Completed: Building pip_compile.pex with 1 requirement: pip-tools==6.2.0
08:41:56.68 [INFO] Completed: Generate lockfile for 19 requirementses: PyYAML<5.5,>=5.4, ansicolors==1.1.8, fasteners==0.16, freezegun==1.1.0, humbug==0.2.6, packaging==20.9, pex==2.1.42, psutil==5.8.0, pytest<6.3,>=6.0.1, request... (222 characters truncated)
08:41:56.68 [INFO] Wrote lockfile to 3rdparty/python/lockfile.txt
$ mv 3rdparty/python/lockfile.txt 3rdparty/python/lockfile.txt.38
$ ./pants --python-setup-interpreter-constraints="['==3.9.*']" lock src:: tests::
08:42:41.35 [INFO] Completed: Building pip_compile.pex with 1 requirement: pip-tools==6.2.0
08:42:51.35 [INFO] Completed: Generate lockfile for 19 requirementses: PyYAML<5.5,>=5.4, ansicolors==1.1.8, fasteners==0.16, freezegun==1.1.0, humbug==0.2.6, packaging==20.9, pex==2.1.42, psutil==5.8.0, pytest<6.3,>=6.0.1, request... (222 characters truncated)
08:42:51.35 [INFO] Wrote lockfile to 3rdparty/python/lockfile.txt
$ mv 3rdparty/python/lockfile.txt 3rdparty/python/lockfile.txt.39
$ diff3 3rdparty/python/lockfile.txt.3*
====
1:2c
  # This file is autogenerated by pip-compile with python 3.7
2:2c
  # This file is autogenerated by pip-compile with python 3.8
3:2c
  # This file is autogenerated by pip-compile with python 3.9
====1
1:97,102c
  importlib-metadata==4.6.1 \
      --hash=sha256:079ada16b7fc30dfbb5d13399a5113110dab1aa7c2bc62f66af75f0b717c8cac \
      --hash=sha256:9f55f560e116f8643ecf2922d9cd3e1c7e8d52e683178fecd9d08f6aa357e11e
      # via
      #   pluggy
      #   pytest
2:96a
3:96a
====1
1:272,274c
      # via
      #   -r requirements.in
      #   importlib-metadata
2:266c
3:266c
      # via -r requirements.in
====1
1:279,282c
  zipp==3.5.0 \
      --hash=sha256:957cfda87797e389580cb8b9e3870841ca991e2125350677b2ca83a0e99390a3 \
      --hash=sha256:f5812b1e007e48cff63449a5e9f4e7ebea716b4111f9c4f9a645f91d579bf0c4
      # via importlib-metadata
2:270a
3:270a

Afaict this is no bueno. Not from a locked resolve standpoint (I don’t want dep drift to shoot me in the foot tomorrow) and certinly not from a security (supply chain) standpoint.