tfx: Dataflow workers not able to install tfx from requirements file due to `no-binary` option from beam stager

When no Beam packaging arguments are provided by the user, TFX generates a requirements file with the tfx package inside.

This ends up failing on Dataflow, because the Beam stager uses pip’s --no-binary flag: https://github.com/apache/beam/blob/v2.15.0/sdks/python/apache_beam/runners/portability/stager.py#L483.

Indeed, in a fresh virtualenv (Python 3.6.3):

pip download tfx==0.14.0 --no-binary :all:
Collecting tfx==0.14.0
  ERROR: Could not find a version that satisfies the requirement tfx==0.14.0 (from versions: none)
ERROR: No matching distribution found for tfx==0.14.0

Whereas if I remove the --no-binary flag, it works just fine.

I’m not all that knowledgable about Python packaging, but is this because TFX is built as a wheel? Is there some Beam option I can pass to make this work?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 7
  • Comments: 18 (6 by maintainers)

Most upvoted comments

Hi @tejaslodaya - glad you found a workaround, but it is just that - a workaround. That said I’m going to keep this open.

Hi @andrewsmartin and @charlesccychen

I managed to solve this issue by doing these steps:

  1. Go to site-packages inside your virtual environment and go to apache_beam/runners/portability/stager.py file.
  2. Go to _populate_requirements_cache function and remove these two lines ‘–no-binary’, ‘:all:’
  3. Reload the package inside your jupyter notebook/ main call.

In my case, I had created conda environment and changed this file: ~/miniconda3/envs/tfx_test/lib/python3.7/site-packages/apache_beam/runners/portability/stager.py where my environment name is tfx_test.

This solves the issue.