datasets: SSL CA failures when using a dataset from gs://

Short description

When trying to load a dataset I get an error “Problem with the SSL CA cert (path? access rights?)” and a subsequent error when six.reraise is called

Environment information

  • Operating System: RHEL 7
  • Python version: 3.7.4
  • tensorflow-datasets version: 3.2.1
  • tensorflow version: 2.2.0

Reproduction instructions

  • Run in python tfds.builder("imagenet2012").info

Link to logs

Traceback (most recent call last):
  File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 399, in try_reraise
    yield
  File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/registered.py", line 244, in builder
    return builder_cls(name)(**builder_kwargs)
  File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/api_utils.py", line 69, in disallow_positional_args_dec
    return fn(*args, **kwargs)
  File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 206, in __init__
    self.info.initialize_from_bucket()
  File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_info.py", line 423, in initialize_from_bucket
    data_files = gcs_utils.gcs_dataset_info_files(self.full_name)
  File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/gcs_utils.py", line 71, in gcs_dataset_info_files
    return gcs_listdir(posixpath.join(GCS_DATASET_INFO_DIR, dataset_dir))
  File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/gcs_utils.py", line 64, in gcs_listdir
    if is_gcs_disabled() or not tf.io.gfile.exists(root_dir):
  File "/sw/installed/TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/lib/io/file_io.py", line 280, in file_exists_v2
    pywrap_tensorflow.FileExists(compat.as_bytes(path))
tensorflow.python.framework.errors_impl.AbortedError: All 10 retry attempts failed. The last failure: Unavailable: Error executing an HTTP request: libcurl code 77 meaning 'Problem with the SSL CA cert (path? access rights?)', error details: error setting certificate verify locations:
  CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
	 when reading metadata of gs://tfds-data/dataset_info/imagenet2012/5.0.0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "hvd_dnn_benchmark.py", line 231, in <module>
    run()  #pylint: disable=no-value-for-parameter
  File "/sw/installed/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/sw/installed/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/sw/installed/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/sw/installed/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "hvd_dnn_benchmark.py", line 105, in run
    dataset = get_dataset(dataset, synthetic=synthetic_data)
  File "/home/h3/s3248973/git/tensorflow_tests/benchmark/datasets.py", line 87, in get_dataset
    return _AVAIL[name](synthetic)
  File "/home/h3/s3248973/git/tensorflow_tests/benchmark/datasets.py", line 77, in _imagenet
    return TFDS_Dataset('imagenet2012', synthetic)
  File "/home/h3/s3248973/git/tensorflow_tests/benchmark/datasets.py", line 54, in __init__
    info = tfds.builder(name).info
  File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/registered.py", line 244, in builder
    return builder_cls(name)(**builder_kwargs)
  File "/sw/installed/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 401, in try_reraise
    reraise(*args, **kwargs)
  File "/home/s3248973/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 392, in reraise
    six.reraise(exc_type, exc_type(msg), exc_traceback)
TypeError: __init__() missing 2 required positional arguments: 'op' and 'message'

Expected behavior No error

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18 (17 by maintainers)

Most upvoted comments

Here is a workaround to deal with this problem, downgrade the tensorflow-datasets:

pip install tensorflow-datasets==3.0.0

None of the solutions works for me, but this.

Hi @Flamefire,

Can you run this code snippet of your machine

import tensorflow as tf
tf.io.gfile.exists("gs://tfds-data/dataset_info/mnist/3.0.1")

and share the results

(Similar issue https://github.com/tensorflow/datasets/issues/2190)