beam: Installation of apache-beam[gcp] in a clean environment gets stuck in pip's dependency resolver.

Installing apache-beam[gcp]==2.34.0 or newer version into a clean virtual environment with pip 21.3 or newer currently gets stuck in pip’s dependency resolver.

Workarounds

  1. Preinstall following Apache Beam dependencies prior to installing Beam itself:
pip install google-api-core==1.31.6
pip install google-cloud-pubsub==2.13.1 google-cloud-bigquery-storage==2.13.2
pip install apache-beam[gcp]
  1. Use apache-beam[gcp]==2.33.0 or earlier that are not as adversely affected.

Symptoms

One of the symptoms that shows numerous backtracking iterations performed by pip’s dependency resolver are warnings like

 google-api-core 2.8.2 does not provide the extra 'grpcgcp
Screenshot 2022-07-11 at 13 28 06 Screenshot 2022-07-11 at 13 31 27

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 19
  • Comments: 17 (10 by maintainers)

Most upvoted comments

As a temporary fix Ive had to use poetry instead of pip and it works fine. Would like to know why the dep tree for this using pip forces the installation of google-api-core 2.8.2 which I can see has removed the extra grpcgcp in this pull request of latest version

For released versions, a workaround could be manually install google-cloud-bigquery-storage<2.14 prior installing apache beam:

RUN pip install google-cloud-pubsub==2.13.1
RUN pip install google-cloud-bigquery-storage==2.13.2
RUN pip install apache-beam[gcp]==2.38.0

Another workaround is to use an older version of pip. Older versions (before 21.3) somehow don’t force the installation of google-api-core 2.8.2. Or, rather, an older pip seems to be able to faster figure out the working version of dependencies and doesn’t time out.

In your Dockerfile change:

RUN pip install --upgrade pip

to:

pip install pip==21.2.4

Only degrading to v2.33.0 was possible in my case.

I also had to remove from installing any lib considered as dependencies by apache-beam and limit thoses that are not such as storage (google-cloud-storage<=2.4.0) or secret-manager (google-cloud-secret-manager<=2.11.1).

Hope it helps

maybe slightly unrelated but if launching the above image as a Dataflow job via Flex Template launcher, the GCE instance setting up the job runs into the same issue (google-api-core 2.8.2 does not provide the extra ‘grpcgcp’), even though the SDK image I have built, using the above fix, does not use google-api-core 2.8.2. I have raised an issue with the Dataflow team: https://issuetracker.google.com/issues/238658546.