tensorflow: tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 'gs' not implemented (file: 'gs://tfds-data/datasets/mnist')

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): It is an example script (for distributed training) provided in TensorFlow
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): v2.2.0-rc2-77-gaad398b5e9 2.2.0-rc3
  • Python version: 3.6.9
  • Bazel version (if compiling from source): 2.0.0
  • GCC/Compiler version (if compiling from source): 7.5.0
  • CUDA/cuDNN version: 10.2 / 7.6.5.32-1
  • GPU model and memory: NVIDIA GeForce 940MX with 2 GB Dedicated VRAM
  • Exact command to reproduce: python3 distributed_training.py

Describe the problem

I have built TensorFlow 2.2 from source (using r2.2 branch) with support of CUDA 10.2 and CUDNN 7.6.5 on Ubuntu 18.04 for python3. During the configuration of the build there was no question like “Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]” After I installed this built whl and tried to use it with the script that requires to access the gs://tfds-data/datasets/mnist data, I got the following error: tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme ‘gs’ not implemented (file: ‘gs://tfds-data/datasets/mnist’) Please advise. I am using tensorflow-datasets 2.1.0, not sure if this can be the cause of the problem (this version agains tensroflow 2.2 version).

Source code / logs

Source code of the script:

from __future__ import absolute_import, division, print_function, unicode_literals
import os

import tensorflow_datasets as tfds
import tensorflow as tf

tfds.disable_progress_bar()


def evaluate_and_get_model(pth):
    mdl = tf.keras.models.load_model(pth, compile=False)
    mdl.compile(loss='sparse_categorical_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])
    evl_loss, evl_acc = mdl.evaluate(eval_dataset)
    print('Eval loss: {}, Eval Accuracy: {}'.format(evl_loss, evl_acc))
    return mdl


# Function for decaying the learning rate.
# You can define any decay function you need.
def decay(epoch):
    if epoch < 3:
        return 1e-3
    elif 3 <= epoch < 7:
        return 1e-4
    else:
        return 1e-5


def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255

    return image, label


# Callback for printing the LR at the end of each epoch.
class PrintLR(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        print('\nLearning rate for epoch {} is {}'.format(epoch + 1, model.optimizer.lr.numpy()))


print(tf.__version__)

datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True, data_dir='gs://tfds-data/datasets')

mnist_train, mnist_test = datasets['train'], datasets['test']

strategy = tf.distribute.MirroredStrategy()

print('Number of devices: {}'.format(strategy.num_replicas_in_sync))

# You can also do info.splits.total_num_examples to get the total
# number of examples in the dataset.

num_train_examples = info.splits['train'].num_examples
num_test_examples = info.splits['test'].num_examples

BUFFER_SIZE = 10000

BATCH_SIZE_PER_REPLICA = 64
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync

train_dataset = mnist_train.map(scale).cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
eval_dataset = mnist_test.map(scale).batch(BATCH_SIZE)

with strategy.scope():
    model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
        tf.keras.layers.MaxPooling2D(),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(loss='sparse_categorical_crossentropy',
                  optimizer=tf.keras.optimizers.Adam(),
                  metrics=['accuracy'])

# Define the checkpoint directory to store the checkpoints

checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, 'ckpt_{epoch}')

callbacks = [
    tf.keras.callbacks.TensorBoard(log_dir='./logs'),
    tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix,
                                       save_weights_only=True),
    tf.keras.callbacks.LearningRateScheduler(decay),
    PrintLR()
]

model.fit(train_dataset, epochs=12, callbacks=callbacks)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

eval_loss, eval_acc = model.evaluate(eval_dataset)

print('Eval loss: {}, Eval Accuracy: {}'.format(eval_loss, eval_acc))

path = 'saved_model/'

model.save(path, save_format='tf')

unreplicated_model = evaluate_and_get_model(path)

unreplicated_model.save(path, save_format='tf')

with strategy.scope():
    evaluate_and_get_model(path)

Command line output of running the “python3 distributed_training.py” command:

2.2.0-rc3
ERROR:absl:Failed to construct dataset mnist
Traceback (most recent call last):
  File "distributed_training.py", line 48, in <module>
    datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True, data_dir='gs://tfds-data/datasets')
  File "/home/vyepishov/.local/lib/python3.6/site-packages/tensorflow_datasets/core/api_utils.py", line 52, in disallow_positional_args_dec
    return fn(*args, **kwargs)
  File "/home/vyepishov/.local/lib/python3.6/site-packages/tensorflow_datasets/core/registered.py", line 302, in load
    dbuilder = builder(name, data_dir=data_dir, **builder_kwargs)
  File "/home/vyepishov/.local/lib/python3.6/site-packages/tensorflow_datasets/core/registered.py", line 172, in builder
    return _DATASET_REGISTRY[name](**builder_kwargs)
  File "/home/vyepishov/.local/lib/python3.6/site-packages/tensorflow_datasets/core/api_utils.py", line 52, in disallow_positional_args_dec
    return fn(*args, **kwargs)
  File "/home/vyepishov/.local/lib/python3.6/site-packages/tensorflow_datasets/core/dataset_builder.py", line 197, in __init__
    self._data_dir = self._build_data_dir()
  File "/home/vyepishov/.local/lib/python3.6/site-packages/tensorflow_datasets/core/dataset_builder.py", line 661, in _build_data_dir
    version_dirs = _other_versions_on_disk()
  File "/home/vyepishov/.local/lib/python3.6/site-packages/tensorflow_datasets/core/dataset_builder.py", line 648, in _other_versions_on_disk
    if not tf.io.gfile.exists(builder_data_dir):
  File "/home/vyepishov/.local/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 267, in file_exists_v2
    _pywrap_file_io.FileExists(compat.as_bytes(path))
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 'gs' not implemented (file: 'gs://tfds-data/datasets/mnist')

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 6
  • Comments: 28 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Any update on GCS support for windows ?

I don’t think GCS filesystem is compiled for Windows for TF 2.3.

We are planning to make filesystems modular (https://github.com/tensorflow/community/pull/101) but this will likely land in TF 2.5.

TF 2.4 should contain work done over the summer as part of Google Summer of Code that would enable GCS filesystems on Windows

Any link to a tutorial or information on doing that. Most people getting this error is from running the official tensorflow tutorial. It may be a good idea to put it there as well.

You need to install tensorflow-io for the other filesystems. See #51583 for similar issue

Error still persists :RuntimeError: UnimplementedError

Could you try with the last version of TFDS and TF ?

I downgrade the tensorflow_datasets from 3.2.1 to 3.1.0, then the issue is disappeared.