beam: Installation of apache-beam[gcp] in a clean environment gets stuck in pip's dependency resolver.
Installing apache-beam[gcp]==2.34.0 or newer version into a clean virtual environment with pip 21.3 or newer currently gets stuck in pip’s dependency resolver.
Workarounds
- Preinstall following Apache Beam dependencies prior to installing Beam itself:
pip install google-api-core==1.31.6
pip install google-cloud-pubsub==2.13.1 google-cloud-bigquery-storage==2.13.2
pip install apache-beam[gcp]
- Use apache-beam[gcp]==2.33.0 or earlier that are not as adversely affected.
Symptoms
One of the symptoms that shows numerous backtracking iterations performed by pip’s dependency resolver are warnings like
google-api-core 2.8.2 does not provide the extra 'grpcgcp


About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 19
- Comments: 17 (10 by maintainers)
For released versions, a workaround could be manually install google-cloud-bigquery-storage<2.14 prior installing apache beam:
Another workaround is to use an older version of
pip
. Older versions (before21.3
) somehow don’t force the installation ofgoogle-api-core 2.8.2
. Or, rather, an olderpip
seems to be able to faster figure out the working version of dependencies and doesn’t time out.In your
Dockerfile
change:to:
+1 here
Only degrading to v2.33.0 was possible in my case.
I also had to remove from installing any lib considered as dependencies by apache-beam and limit thoses that are not such as storage (google-cloud-storage<=2.4.0) or secret-manager (google-cloud-secret-manager<=2.11.1).
Hope it helps
maybe slightly unrelated but if launching the above image as a Dataflow job via Flex Template launcher, the GCE instance setting up the job runs into the same issue (google-api-core 2.8.2 does not provide the extra ‘grpcgcp’), even though the SDK image I have built, using the above fix, does not use
google-api-core 2.8.2
. I have raised an issue with the Dataflow team: https://issuetracker.google.com/issues/238658546.