pip: "check" does not consider extras and allow environments not possible to re-create with new resolver

I tried installing airflow 2.0.0rc3 with the new master version of PIP (to be 20.3.2).

When I run the same installation using 20.2.4 the whole install takes about 20 seconds when I already pre-installed some of the prerequisites and it produces consistent installation without any conflicts:

Successfully installed apache-airflow-2.0.0rc3 apache-airflow-providers-amazon-1.0.0 apache-airflow-providers-apache-cassandra-1.0.0 apache-airflow-providers-apache-druid-1.0.0 apache-airflow-providers-apache-hdfs-1.0.0 apache-airflow-providers-apache-hive-1.0.0 apache-airflow-providers-apache-kylin-1.0.0 apache-airflow-providers-apache-livy-1.0.0 apache-airflow-providers-apache-pig-1.0.0 apache-airflow-providers-apache-pinot-1.0.0 apache-airflow-providers-apache-spark-1.0.0 apache-airflow-providers-apache-sqoop-1.0.0 apache-airflow-providers-celery-1.0.0 apache-airflow-providers-cloudant-1.0.0 apache-airflow-providers-cncf-kubernetes-1.0.0 apache-airflow-providers-databricks-1.0.0 apache-airflow-providers-datadog-1.0.0 apache-airflow-providers-dingding-1.0.0 apache-airflow-providers-discord-1.0.0 apache-airflow-providers-docker-1.0.0 apache-airflow-providers-elasticsearch-1.0.0 apache-airflow-providers-exasol-1.0.0 apache-airflow-providers-facebook-1.0.0 apache-airflow-providers-ftp-1.0.0 apache-airflow-providers-google-1.0.0 apache-airflow-providers-grpc-1.0.0 apache-airflow-providers-hashicorp-1.0.0 apache-airflow-providers-http-1.0.0 apache-airflow-providers-imap-1.0.0 apache-airflow-providers-jdbc-1.0.0 apache-airflow-providers-jenkins-1.0.0 apache-airflow-providers-jira-1.0.0 apache-airflow-providers-microsoft-azure-1.0.0 apache-airflow-providers-microsoft-mssql-1.0.0 apache-airflow-providers-microsoft-winrm-1.0.0 apache-airflow-providers-mongo-1.0.0 apache-airflow-providers-mysql-1.0.0 apache-airflow-providers-odbc-1.0.0 apache-airflow-providers-openfaas-1.0.0 apache-airflow-providers-opsgenie-1.0.0 apache-airflow-providers-oracle-1.0.0 apache-airflow-providers-pagerduty-1.0.0 apache-airflow-providers-papermill-1.0.0 apache-airflow-providers-plexus-1.0.0 apache-airflow-providers-postgres-1.0.0 apache-airflow-providers-presto-1.0.0 apache-airflow-providers-qubole-1.0.0 apache-airflow-providers-redis-1.0.0 apache-airflow-providers-salesforce-1.0.0 apache-airflow-providers-samba-1.0.0 apache-airflow-providers-segment-1.0.0 apache-airflow-providers-sendgrid-1.0.0 apache-airflow-providers-sftp-1.0.0 apache-airflow-providers-singularity-1.0.0 apache-airflow-providers-slack-1.0.0 apache-airflow-providers-snowflake-1.0.0 apache-airflow-providers-sqlite-1.0.0 apache-airflow-providers-ssh-1.0.0 apache-airflow-providers-telegram-1.0.0 apache-airflow-providers-vertica-1.0.0 apache-airflow-providers-yandex-1.0.0 apache-airflow-providers-zendesk-1.0.0
WARNING: You are using pip version 20.2.4; however, version 20.3.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
root@5fea5c1105cb:/opt/airflow# pip check
No broken requirements found.
root@5fea5c1105cb:/opt/airflow# 

With PIP 20.3.2@master it takes about 30 minutes and fails producing “Cannot install” error:

ERROR: Cannot install google-cloud-bigquery[bqstorage,pandas]==2.4.0, nteract-scrapbook==0.4.1, nteract-scrapbook[all]==0.4.1 and papermill[all]==2.2.2 because these package versions have conflicting dependencies.

The conflict is caused by:
    nteract-scrapbook[all] 0.4.1 depends on pyarrow
    nteract-scrapbook 0.4.1 depends on pyarrow
    papermill[all] 2.2.2 depends on pyarrow; extra == "all"
    google-cloud-bigquery[bqstorage,pandas] 2.4.0 depends on pyarrow<3.0dev and >=1.0.0; extra == "bqstorage"

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

The whole installation log is attached. pip-20-3-2-master.txt

Installation environment:

Latest official Python 3.6 image with airflow dependencies (based on debian buster):

docker pull apache/airflow:master-python3.6-ci

Installation method:

pip install apache-airflow[devel_ci]==2.0.0rc3 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt

The easiest way to reproduce:

git clone https://github.com/apache/airflow.git
./breeze --install-airflow-version 2.0.0rc3 --skip-mounting-local-sources
# While in the container

# uninstall airflow and all 60 providers
pip uninstall apache-airflow -y
pip freeze  |grep airflow-providers |xargs pip uninstall -y

# Install Pip from master
# NOTE! Skip this step if you want to compare 2.20.4 - this is the default version we have in the image)
pip install git+https://github.com/pypa/pip@master

# install airflow in recommended way
pip install apache-airflow[devel_ci]==2.0.0rc3 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt

Related issues: #9298 #9297

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (15 by maintainers)

Most upvoted comments

I investigated into 3. pip check did not catch the conflict because pyarrow <3.0dev,>=1.0.0 is only specified by google-cloud-bigquery in extras (three exras pandas, bqstorage, and all contain this), and pip check does not follow extras (because pip does not record them in the first place). So I’d guess the new resolver is correct, and there really isn’t a solution with pyarrow==0.17.1.

How we could improve pip check though, is a whole big topic…

So I think the questions here for us are

  1. Does pyarrow==0.17.1 actually work in this dependency graph?
  2. If it does, why did the new resolver fail to find a solution?
  3. If it does not, why did pip check not catch the conflict?

I have some GOOD NEWS this time.

I did some experiments and manually upgraded the pyarrow version to 2.0.0 (despite eager upgrade in 20.2.4 not upgrading it). and … suddenly PIP@Master seems to work!

root@5fea5c1105cb:/opt/airflow# pip check
No broken requirements found.
root@5fea5c1105cb:/opt/airflow# 

I do not see any any backtracking and it seems to be reasonably fast. Just repeated a fresh install (I uninstalled all installed packages before) - and suddenly it works. It seems with pyarrow 2.0.0 also PIP 20.2.4 works (even if with eager update strategy it did not upgrade pyarrow before). So indeed it looks like a bug with pip 20.2.4 that generated bad constraints and caused the new PIP to go haywire.

I will manually update all the constraints now.

20.3.3, later today. 😃