beam: [Bug]: Python 3.10 installing `apache-beam==2.43.0` with `multiprocess>=0.70.12` has incompatibility for `dill`
What happened?
On Python 3.10 it is not possible to install apache-beam==2.43.0
together with multiprocess
. This is due to Python 3.10 only being supported by multiprocess>=0.70.12
which requires dill>=0.3.4
and is in conflict with the apache-beam requirement for dill>=0.3.1.1,<0.3.2
.
These libraries are used together, for example, in the datasets library.
Is there a specific reason for the apache-beam version requirement of the dill
package? If not, maybe this could be updated to fix the issue for Python 3.10? Although I have not tested it, the same issue might apply to Python 3.9 which is only supported by multiprocess>=0.70.11
requiring dill>=0.3.3
.
Issue Priority
Priority: 2
Issue Component
Component: dependencies
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 2
- Comments: 17 (13 by maintainers)
Commits related to this issue
- limit multiprocess version, see https://github.com/apache/beam/issues/24458#issuecomment-1343034865 — committed to helpmefindaname/ner-eval-dashboard by helpmefindaname a year ago
we plan to update to next version of dill before the next Beam release. I am now looking into the issue.
pinning latest version is possible, we do that for cloudpickle; but currently Beam is incompatible with the latest version of dill as far as I know. There is ongoing work in https://github.com/apache/beam/pull/23870 to vendor dill, which would remove the tight bound.
I also have a similar issue. I’m trying to install HuggingFace
datasets
which depends onmultiprocess
, so switching tomultiprocessing
is not an option. Right nowdill
is pinned to0.3.1.1
in Beam, which is from 2019 anddatasets
2.x is from 2022 and is pinning to the latest version ofmultiprocess
available, so it’s impossible to make it work.Could it be an option to pin each Beam release with the latest version available of
dill
similar to whatmultiprocess
does? That would also benefit of any improvements and bug fixes they make. Since both server and workers would have the same Beam version, they should also have the samedill
version, right?Unfortunately right now you are dealing with two packages that have tight constraints, and I understand the inconvenience. We plan to -address the inconvenience on our side by vendoring dill, but that would take some time. I don’t see a clean solution right now, but sounds like installing a newer version of multiprocessing, while ignoring its constrains, would work for you. You can ignore beam’s constraint on dill, but if you do so, you need to make sure you install the same version of dill on the workers, or your pipelines will fail.
The only solution I have found is to pin the
multiprocess
dependency:But this would mean that the package is highly constrained with respect to
multiprocess
. It also means that for Python >= 3.9 we require a two step installation, see https://github.com/uqfoundation/multiprocess/issues/125#issue-1471452077: