tfx: Dataflow workers not able to install tfx from requirements file due to `no-binary` option from beam stager
When no Beam packaging arguments are provided by the user, TFX generates a requirements file with the tfx package inside.
This ends up failing on Dataflow, because the Beam stager uses pip’s --no-binary flag: https://github.com/apache/beam/blob/v2.15.0/sdks/python/apache_beam/runners/portability/stager.py#L483.
Indeed, in a fresh virtualenv (Python 3.6.3):
pip download tfx==0.14.0 --no-binary :all:
Collecting tfx==0.14.0
ERROR: Could not find a version that satisfies the requirement tfx==0.14.0 (from versions: none)
ERROR: No matching distribution found for tfx==0.14.0
Whereas if I remove the --no-binary flag, it works just fine.
I’m not all that knowledgable about Python packaging, but is this because TFX is built as a wheel? Is there some Beam option I can pass to make this work?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 7
- Comments: 18 (6 by maintainers)
Hi @tejaslodaya - glad you found a workaround, but it is just that - a workaround. That said I’m going to keep this open.
Hi @andrewsmartin and @charlesccychen
I managed to solve this issue by doing these steps:
_populate_requirements_cachefunction and remove these two lines ‘–no-binary’, ‘:all:’In my case, I had created conda environment and changed this file:
~/miniconda3/envs/tfx_test/lib/python3.7/site-packages/apache_beam/runners/portability/stager.pywhere my environment name istfx_test.This solves the issue.