beam: [Bug]: Python 3.10 installing `apache-beam==2.43.0` with `multiprocess>=0.70.12` has incompatibility for `dill`

What happened?

On Python 3.10 it is not possible to install apache-beam==2.43.0 together with multiprocess. This is due to Python 3.10 only being supported by multiprocess>=0.70.12 which requires dill>=0.3.4 and is in conflict with the apache-beam requirement for dill>=0.3.1.1,<0.3.2.

These libraries are used together, for example, in the datasets library.

Is there a specific reason for the apache-beam version requirement of the dill package? If not, maybe this could be updated to fix the issue for Python 3.10? Although I have not tested it, the same issue might apply to Python 3.9 which is only supported by multiprocess>=0.70.11 requiring dill>=0.3.3.

Issue Priority

Priority: 2

Issue Component

Component: dependencies

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 2
  • Comments: 17 (13 by maintainers)

Commits related to this issue

Most upvoted comments

we plan to update to next version of dill before the next Beam release. I am now looking into the issue.

pinning latest version is possible, we do that for cloudpickle; but currently Beam is incompatible with the latest version of dill as far as I know. There is ongoing work in https://github.com/apache/beam/pull/23870 to vendor dill, which would remove the tight bound.

I also have a similar issue. I’m trying to install HuggingFace datasets which depends on multiprocess, so switching to multiprocessing is not an option. Right now dill is pinned to 0.3.1.1 in Beam, which is from 2019 and datasets 2.x is from 2022 and is pinning to the latest version of multiprocess available, so it’s impossible to make it work.

Could it be an option to pin each Beam release with the latest version available of dill similar to what multiprocess does? That would also benefit of any improvements and bug fixes they make. Since both server and workers would have the same Beam version, they should also have the same dill version, right?

Unfortunately right now you are dealing with two packages that have tight constraints, and I understand the inconvenience. We plan to -address the inconvenience on our side by vendoring dill, but that would take some time. I don’t see a clean solution right now, but sounds like installing a newer version of multiprocessing, while ignoring its constrains, would work for you. You can ignore beam’s constraint on dill, but if you do so, you need to make sure you install the same version of dill on the workers, or your pipelines will fail.

The only solution I have found is to pin the multiprocess dependency:

setuptools.setup(
    ...,
    install_requires=["apache-beam", "multiprocess==0.70.9", ...],
)

But this would mean that the package is highly constrained with respect to multiprocess. It also means that for Python >= 3.9 we require a two step installation, see https://github.com/uqfoundation/multiprocess/issues/125#issue-1471452077:

pip install -e .
pip install --no-deps multiprocess>=0.70.11