beam: [Bug]: Python 3.10 installing `apache-beam==2.43.0` with `multiprocess>=0.70.12` has incompatibility for `dill`
What happened?
On Python 3.10 it is not possible to install apache-beam==2.43.0 together with multiprocess. This is due to Python 3.10 only being supported by multiprocess>=0.70.12 which requires dill>=0.3.4 and is in conflict with the apache-beam requirement for dill>=0.3.1.1,<0.3.2.
These libraries are used together, for example, in the datasets library.
Is there a specific reason for the apache-beam version requirement of the dill package? If not, maybe this could be updated to fix the issue for Python 3.10? Although I have not tested it, the same issue might apply to Python 3.9 which is only supported by multiprocess>=0.70.11 requiring dill>=0.3.3.
Issue Priority
Priority: 2
Issue Component
Component: dependencies
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 2
- Comments: 17 (13 by maintainers)
Commits related to this issue
- limit multiprocess version, see https://github.com/apache/beam/issues/24458#issuecomment-1343034865 — committed to helpmefindaname/ner-eval-dashboard by helpmefindaname a year ago
we plan to update to next version of dill before the next Beam release. I am now looking into the issue.
pinning latest version is possible, we do that for cloudpickle; but currently Beam is incompatible with the latest version of dill as far as I know. There is ongoing work in https://github.com/apache/beam/pull/23870 to vendor dill, which would remove the tight bound.
I also have a similar issue. I’m trying to install HuggingFace
datasetswhich depends onmultiprocess, so switching tomultiprocessingis not an option. Right nowdillis pinned to0.3.1.1in Beam, which is from 2019 anddatasets2.x is from 2022 and is pinning to the latest version ofmultiprocessavailable, so it’s impossible to make it work.Could it be an option to pin each Beam release with the latest version available of
dillsimilar to whatmultiprocessdoes? That would also benefit of any improvements and bug fixes they make. Since both server and workers would have the same Beam version, they should also have the samedillversion, right?Unfortunately right now you are dealing with two packages that have tight constraints, and I understand the inconvenience. We plan to -address the inconvenience on our side by vendoring dill, but that would take some time. I don’t see a clean solution right now, but sounds like installing a newer version of multiprocessing, while ignoring its constrains, would work for you. You can ignore beam’s constraint on dill, but if you do so, you need to make sure you install the same version of dill on the workers, or your pipelines will fail.
The only solution I have found is to pin the
multiprocessdependency:But this would mean that the package is highly constrained with respect to
multiprocess. It also means that for Python >= 3.9 we require a two step installation, see https://github.com/uqfoundation/multiprocess/issues/125#issue-1471452077: