tfx: TFX >= 1.4.0 fails with S3 as backend due to tensorflow-io not being imported

TFX >= 1.4.0 fails with S3 as backend due to tensorflow-io not being imported. Up to tensorflow 2.5.*, the other filesystems was a part of tensorflow but from TF 2.6 this has been moved to tf-io. However, tf io isn’t imported in tfx/orchestration/kubeflow/container_entrypoint.py and hence, S3 (and several other) filesystem can’t be used.

  • Have I specified the code to reproduce the issue (Yes, No): No
  • Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): KubeFlow, Ubuntu image
  • TensorFlow version: 2.7
  • TFX Version: 1.5
  • Python version: 3.7
  • Python dependencies (from pip freeze output):

Describe the current behavior TFX >= 1.4.0 fails with S3 as backend due to tensorflow-io not being imported

Describe the expected behavior S3 filesystem should work.

Standalone code to reproduce the issue Any simple pipeline which uses s3 as storage backend.

Other info / logs

INFO:absl:Going to run a new execution 27735
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 476, in <module>
    main(sys.argv[1:])
  File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 468, in main
    execution_info = component_launcher.launch()
  File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/portable/launcher.py", line 524, in launch
    execution_preparation_result = self._prepare_execution()
  File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/portable/launcher.py", line 384, in _prepare_execution
    self._output_resolver.get_executor_output_uri(execution.id)),
  File "/root/pyenv/lib/python3.7/site-packages/tfx/orchestration/portable/outputs_utils.py", line 169, in get_executor_output_uri
    fileio.makedirs(execution_dir)
  File "/root/pyenv/lib/python3.7/site-packages/tfx/dsl/io/fileio.py", line 80, in makedirs
    _get_filesystem(path).makedirs(path)
  File "/root/pyenv/lib/python3.7/site-packages/tfx/dsl/io/plugins/tensorflow_gfile.py", line 71, in makedirs
    tf.io.gfile.makedirs(path)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/lib/io/file_io.py", line 515, in recursive_create_dir_v2
    _pywrap_file_io.RecursivelyCreateDir(compat.path_to_bytes(path))
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 's3' not implemented (file: 's3://pipelines/tfx/trace_model_pipeline/TimeBasedExampleGen/.system/executor_execution/27735')

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 4
  • Comments: 44 (22 by maintainers)

Most upvoted comments

The root cause is https://github.com/tensorflow/tensorflow/issues/51583. TF dropped s3 / HDFS support from 2.6 and I believe that all our packages are affected by this. We could support s3 by importing tensorflow_io dependency in the repo.

This fix could be potentially included in next release.

@varshaan Mostly great news: the issue in tf Transform seems to be resolved!

@jiyongjung0 Slightly worse news: similar issue is still present in Evaluator (see log below). Can someone look at this ASAP? This is the issue as in Transform before so a simple import tensorflow_io will probably do the trick.

This is the final component so when this is fixed, TFX is officially S3 certified again.

Traceback (most recent call last):
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 267, in _execute
    response = task()
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 340, in <lambda>
    lambda: self.create_worker().do_instruction(request), request)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 581, in do_instruction
    getattr(request, request_type), request.instruction_id)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 612, in process_bundle
    instruction_id, request.process_bundle_descriptor_id)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 445, in get
    self.data_channel_factory)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 865, in __init__
    op.setup()
  File "apache_beam/runners/worker/operations.py", line 648, in apache_beam.runners.worker.operations.DoOperation.setup
  File "apache_beam/runners/worker/operations.py", line 697, in apache_beam.runners.worker.operations.DoOperation.setup
  File "apache_beam/runners/common.py", line 1245, in apache_beam.runners.common.DoFnRunner.setup
  File "apache_beam/runners/common.py", line 1241, in apache_beam.runners.common.DoFnRunner._invoke_lifecycle_method
  File "apache_beam/runners/common.py", line 1281, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1239, in apache_beam.runners.common.DoFnRunner._invoke_lifecycle_method
  File "apache_beam/runners/common.py", line 465, in apache_beam.runners.common.DoFnInvoker.invoke_setup
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow_model_analysis/utils/model_util.py", line 863, in setup
    super().setup()
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow_model_analysis/utils/model_util.py", line 678, in setup
    model_load_time_callback=self._set_model_load_seconds)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow_model_analysis/types.py", line 305, in load
    return self._shared_handle.acquire(construct_fn)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/utils/shared.py", line 312, in acquire
    return _shared_map.acquire(self._key, constructor_fn, tag)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/utils/shared.py", line 253, in acquire
    result = control_block.acquire(constructor_fn, tag)
  File "/root/pyenv/lib/python3.7/site-packages/apache_beam/utils/shared.py", line 146, in acquire
    result = constructor_fn()
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow_model_analysis/types.py", line 314, in with_load_times
    model = self.construct_fn()
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow_model_analysis/utils/model_util.py", line 654, in construct_fn
    model = tf.compat.v1.saved_model.load_v2(eval_saved_model_path, tags=tags)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py", line 936, in load
    result = load_internal(export_dir, tags, options)["root"]
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py", line 949, in load_internal
    loader_impl.parse_saved_model_with_debug_info(export_dir))
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/saved_model/loader_impl.py", line 57, in parse_saved_model_with_debug_info
    saved_model = parse_saved_model(export_dir)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/saved_model/loader_impl.py", line 98, in parse_saved_model
    if file_io.file_exists(path_to_pb):
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/lib/io/file_io.py", line 296, in file_exists
    return file_exists_v2(filename)
  File "/root/pyenv/lib/python3.7/site-packages/tensorflow/python/lib/io/file_io.py", line 288, in file_exists_v2
    _pywrap_file_io.FileExists(compat.path_to_bytes(path))
RuntimeError: tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 's3' not implemented (file: 's3://pipelines/tfx/trace_model_pipeline/Trainer.time-splitted/model/1261998/Format-Serving/saved_model.pb') [while running 'ExtractEvaluateAndWriteResults/ExtractAndEvaluate/ExtractTransformedFeatures/Predict']

@ConverJens - Could you try with a nightly post https://github.com/tensorflow/transform/commit/6f082654050dc6b49b8c3e2549445487c30f3c75 and let me know if that works?

@jiyongjung0 @varshaan Indeed, I used TFMA 0.37.0 and when upgrading to 0.38.0 it worked! Thank you very much for your time and effort! I consider this issue closed.

I’m still working on it. I’ll have something out by early next week.

@varshaan It did. But more over, TFX 1.6 also works if one force installs tensorflow 2.5.1 which is the last version where filesystem support was still a part of tensorflow.

@ConverJens Thank you for the explanation. I think that your insight is correct. But the use of TF API might be hard to change because TF-Transform cannot depends on TFX.

It seems like Beam calls the transform libraries in a separate worker process and tensorflow_io is not imported in it. We might need to add an import (For example, similar to what TFX did) at TFT. (CC @varshaan )