tensorflow: Windows - tensorflow.python.framework.errors_impl.UnknownError: Failed to rename:

System information

OS Name: Microsoft Windows 10 Enterprise OS Version: 10.0.17763 N/A Build 17763 TensorFlow installed using ‘conda’. tensorflow v2.2.0-rc4-8-g2b96f3662b 2.2.0 Python 3.6.10 |Anaconda, Inc.| (default, Jan 7 2020, 15:18:16) [MSC v.1916 64 bit (AMD64)] on win32

Describe the current behavior

Saving checkpoint files from tensorflow is failing on Windows 10.

Traceback (most recent call last):
  File "C:\Users\<redacted>\Miniconda3\envs\<redacted>\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\<redacted>\Miniconda3\envs\<redacted>\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Git\<redacted>\tests\integration\validate_train_model.py", line 216, in <module>
    main()
  File "C:\Git\<redacted>\tests\integration\validate_train_model.py", line 176, in main
    fig_save_freq = fig_save_freq)
  File "c:\git\<redacted>\src\pointnet\model.py", line 640, in fit
    self.save_best_model()
  File "c:\git\<redacted>\src\pointnet\model.py", line 493, in save_best_model
    check_interval = False)
  File "C:\Users\<redacted>\Miniconda3\envs\<redacted>\lib\site-packages\tensorflow\python\training\checkpoint_management.py", line 823, in save
    self._record_state()
  File "C:\Users\<redacted>\Miniconda3\envs\<redacted>\lib\site-packages\tensorflow\python\training\checkpoint_management.py", line 728, in _record_state
    save_relative_paths=True)
  File "C:\Users\<redacted>\Miniconda3\envs\<redacted>\lib\site-packages\tensorflow\python\training\checkpoint_management.py", line 248, in update_checkpoint_state_internal
    text_format.MessageToString(ckpt))
  File "C:\Users\<redacted>\Miniconda3\envs\<redacted>\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 532, in atomic_write_string_to_file
    rename(temp_pathname, filename, overwrite)
  File "C:\Users\<redacted>\Miniconda3\envs\<redacted>\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 491, in rename
    rename_v2(oldname, newname, overwrite)
  File "C:\Users\<redacted>\Miniconda3\envs\<redacted>\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 508, in rename_v2
    compat.as_bytes(src), compat.as_bytes(dst), overwrite)
tensorflow.python.framework.errors_impl.UnknownError: Failed to rename: tests\files\checkpoints\0000_00_00_00_00_00\checkpoint.tmpc6ee5d6bc5a445c884bba8c3acadf01f to: tests\files\checkpoints\0000_00_00_00_00_00\checkpoint : Access is denied.
; Input/output error

Problem traced to: tensorflow.python.lib.io.file_io, line 532, function atomic_write_string_to_file

From debugging, tensorflow attempts to create, then overwrite a file while saving a checkpoint. For some reason, the ‘overwrite’ parameter, although set to True, does nothing. This causes the rename to fail (since the file seems to get created earlier in the checkpoint save process).

We tried deleting the ‘checkpoint’ file before the ‘save’, but the checkpoint file that it’s trying to overwrite appears to be created as a part of the ‘save’ call.

I was able to get checkpoint saving working again by modifying atomic_write_string_to_file as follows. My change checks for existence of the rename target and deletes it using os.remove if overwrite is True, rather than relying on the tensorflow custom machinery that doesn’t seem to be working:

def atomic_write_string_to_file(filename, contents, overwrite=True):
  if not has_atomic_move(filename):
    write_string_to_file(filename, contents)
  else:
    temp_pathname = filename + ".tmp" + uuid.uuid4().hex
    write_string_to_file(temp_pathname, contents)
    try:
      if overwrite and os.path.exists(filename):
        os.remove(filename)
      rename(temp_pathname, filename, overwrite)
    except errors.OpError:
      delete_file(temp_pathname)
      raise

The stack trace we got suggested that this is the same issue as someone was reporting for tensorflow.models: https://github.com/tensorflow/models/issues/4177

Describe the expected behavior

We should be able to successfully save a checkpoint on Windows 10.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 9
  • Comments: 27 (1 by maintainers)

Most upvoted comments

I am getting this same error when trying to train a custom object detection model using TF2:

tensorflow.python.framework.errors_impl.UnknownError: Failed to rename: training\checkpoint.tmpa7a5285bf7fa4fc1861942fc88f3e099 to: training\checkpoint : Access is denied. ; Input/output error

tensorflow.python.framework.errors_impl.UnknownError: Failed to rename: training\checkpoint.tmpa7a5285bf7fa4fc1861942fc88f3e099 to: training\checkpoint : Access is denied. ; Input/output error

“training\checkpoint.tmpa7a5285bf7fa4fc1861942fc88f3e099” “training\checkpoint” Had problem with this either. Directory “training\checkpoint” existed so it was impossible to rename any file to this name anymore.

What did you do then to solve it then? any suggestion

Just rename “checkpoint” directory to “cp” for example (remember to change paths in pipeline.config).

I am getting this error as well.

Stackoverflow thread: https://stackoverflow.com/questions/65461750/tensorflow-python-framework-errors-impl-unknownerror-failed-to-rename-access

Code:

import tensorflow_datasets as tfds
datasets, info = tfds.load("imdb_reviews", as_supervised=True, with_info=True)

Output

Writing...:   0%|          | 0/2500 [00:00<?, ? examples/s]
Shuffling...:  90%|█████████ | 18/20 [00:01<00:00, 14.15 shard/s]
Reading...: 0 examples [00:00, ? examples/s]
                                            
Writing...:   0%|          | 0/2500 [00:00<?, ? examples/s]
                                                           
Reading...: 0 examples [00:00, ? examples/s]
                                            
Writing...:   0%|          | 0/2500 [00:00<?, ? examples/s]
Traceback (most recent call last):
  File "C:\Anaconda3\envs\ml_tf\lib\site-packages\IPython\core\interactiveshell.py", line 3418, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-3b586bfe81d7>", line 3, in <module>
    datasets, info = tfds.load("imdb_reviews", as_supervised=True, with_info=True)
  File "C:\Anaconda3\envs\ml_tf\lib\site-packages\tensorflow_datasets\core\api_utils.py", line 52, in disallow_positional_args_dec
    return fn(*args, **kwargs)
  File "C:\Anaconda3\envs\ml_tf\lib\site-packages\tensorflow_datasets\core\registered.py", line 300, in load
    dbuilder.download_and_prepare(**download_and_prepare_kwargs)
  File "C:\Anaconda3\envs\ml_tf\lib\site-packages\tensorflow_datasets\core\api_utils.py", line 52, in disallow_positional_args_dec
    return fn(*args, **kwargs)
  File "C:\Anaconda3\envs\ml_tf\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 307, in download_and_prepare
    self.info.write_to_directory(self._data_dir)
  File "C:\Anaconda3\envs\ml_tf\lib\contextlib.py", line 119, in __exit__
    next(self.gen)
  File "C:\Anaconda3\envs\ml_tf\lib\site-packages\tensorflow_datasets\core\file_format_adapter.py", line 200, in incomplete_dir
    tf.io.gfile.rename(tmp_dir, dirname)
  File "C:\Anaconda3\envs\ml_tf\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 546, in rename_v2
    compat.as_bytes(src), compat.as_bytes(dst), overwrite)
tensorflow.python.framework.errors_impl.UnknownError: Failed to rename: C:\Users\User\tensorflow_datasets\imdb_reviews\plain_text\0.1.0.incomplete5JQVCL to: C:\Users\User\tensorflow_datasets\imdb_reviews\plain_text\0.1.0 : Access is denied.
; Input/output error


I am on Windows 10; TF 2.3; Python 3.7.9;

HEre’s conda list

sqlite                    3.33.0               h2a8f88b_0
tensorboard               2.3.0              pyh4dce500_0
tensorboard-plugin-wit    1.6.0                      py_0
tensorflow                2.3.0           mkl_py37h3bad0a6_0
tensorflow-base           2.3.0           eigen_py37h17acbac_0
tensorflow-datasets       1.2.0                    py37_0
tensorflow-estimator      2.3.0              pyheb71bc4_0
tensorflow-metadata       0.14.0             pyhe6710b0_1
tensorflow-mkl            2.3.0                h93d2e19_0

Can someone please help?

I had this issue and moving my code and data outside of Dropbox directory solved the problem. (This didn’t used to happen with Dropbox, but now it does).

Failed to rename: path\trial_5a095c02600a30dc086a9efe046b1272\checkpoints\epoch_0\checkpoint_temp/part-00000-of-00001.data-00000-of-00001 to: path\trial_5a095c02600a30dc086a9efe046b1272\checkpoints\epoch_0\checkpoint.data-00000-of-00001

try to make this path is shorter less than 255 character

Actually, I think I found an explanation on stack overflow that led to a solution for me ( https://stackoverflow.com/questions/41365318/access-is-denied-when-renaming-folder ).

Basically the python script can’t rename folders in windows if the target directory is the folder the process is running in, or a sub-folder of that folder. I changed my training folder to a c:/Temp/Training, and now it’s not in the script’s directory path at all.

This solution works for me, no more access denied errors in Windows 10.

I was having the same access denied issue. I followed this advice and it solved the issue for me with a caveat. I couldn’t use /Temp for some reason (was getting permission denied). But when I used /ProgramData/PythonTraining/my_checkpoint all errors went away.

ps: note I had long file names enabled in registry as well prior to this and it did not help.

I can confirm problem still exists for both tensorflow 2.3 and 2.4 in Windows. I have tried all recommended solutions, including modifying the atomic_write_string_to_file function as described on top, specifying different folders for checkpoint and save, shutting down antivirus and all cloud back up services etc. But still ran into “failed to rename error” repeatedly in normal tensorflow model training.

I guess it’s time for Linux? WSL2 is premature and don’t have multiple GPU support yet. I feel that Windows is just not very loved.

I am also getting this error on: Windows 10; TF 2.3; Python 3.7

tensorflow.python.framework.errors_impl.UnknownError: Failed to rename: path\trial_5a095c02600a30dc086a9efe046b1272\checkpoints\epoch_0\checkpoint_temp/part-00000-of-00001.data-00000-of-00001 to: path\trial_5a095c02600a30dc086a9efe046b1272\checkpoints\epoch_0\checkpoint.data-00000-of-00001 : Access is denied. ; Input/output error [Op:MergeV2Checkpoints]

Are you Running another Process while Learning Like Eval for Example

I am also getting this error on: Windows 10; TF 2.3; Python 3.7

tensorflow.python.framework.errors_impl.UnknownError: Failed to rename: path\trial_5a095c02600a30dc086a9efe046b1272\checkpoints\epoch_0\checkpoint_temp/part-00000-of-00001.data-00000-of-00001 to: path\trial_5a095c02600a30dc086a9efe046b1272\checkpoints\epoch_0\checkpoint.data-00000-of-00001 : Access is denied. ; Input/output error [Op:MergeV2Checkpoints]

Unfortunately @dtmaidenmueller workaround did not work for me. I am also using Windows 10 Enterprise, with Python 3.8.5, Tensorflow 2.3.0 and Keras-Tuner 1.0.1. I am also saving the results for visualization on TensorBoard. Tensorflow was installed without conda. The error started to appear only when I increased the number of maximum trials for the tuner from 30 to 150 (and above) and changed the python script I was using to call the keras-tuner to a function, which is now called by another python script.

  File "...Python38\lib\site-packages\kerastuner\engine\base_tuner.py", line 130, in search 
    self.run_trial(trial, *fit_args, **fit_kwargs) 
  File "...Python38\lib\site-packages\kerastuner\engine\multi_execution_tuner.py", line 96, in run_trial 
    history = model.fit(*fit_args, **copied_fit_kwargs) 
  File "...Python38\lib\site-packages\tensorflow\python\keras\engine\training.py", line 108, in _method_wrapper 
    return method(self, *args, **kwargs) 
  File "...Python38\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1137, in fit 
    callbacks.on_epoch_end(epoch, epoch_logs) 
  File "...Python38\lib\site-packages\tensorflow\python\keras\callbacks.py", line 412, in on_epoch_end 
    callback.on_epoch_end(epoch, logs) 
  File "...Python38\lib\site-packages\tensorflow\python\keras\callbacks.py", line 1249, in on_epoch_end 
    self._save_model(epoch=epoch, logs=logs) 
  File "...Python38\lib\site-packages\tensorflow\python\keras\callbacks.py", line 1298, in _save_model 
    self.model.save_weights( 
  File "...Python38\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2101, in save_weights 
    self._trackable_saver.save(filepath, session=session, options=options) 
  File "...Python38\lib\site-packages\tensorflow\python\training\tracking\util.py", line 1199, in save 
    save_path, new_feed_additions = self._save_cached_when_graph_building( 
  File "...Python38\lib\site-packages\tensorflow\python\training\tracking\util.py", line 1145, in _save_cached_when_graph_building 
    save_op = saver.save(file_prefix, options=options) 
  File "...Python38\lib\site-packages\tensorflow\python\training\saving\functional_saver.py", line 295, in save 
    return save_fn() 
  File "...Python38\lib\site-packages\tensorflow\python\training\saving\functional_saver.py", line 281, in save_fn 
    return gen_io_ops.merge_v2_checkpoints( 
  File "...Python38\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 504, in merge_v2_checkpoints 
    return merge_v2_checkpoints_eager_fallback( 
  File "...Python38\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 529, in merge_v2_checkpoints_eager_fallback 
    _result = _execute.execute(b"MergeV2Checkpoints", 0, inputs=_inputs_flat, 
  File "...Python38\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute 
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 
tensorflow.python.framework.errors_impl.UnknownError: Failed to rename: optm_python\bayes\trial_18985bf723f3fa00697399e0ad7712cc\checkpoints\epoch_0\checkpoint_temp_c52e87337ed14d0dac98560fcadb2092/part-00000-of-00001.data-00000-of-00001 to: optm_python\bayes\trial_18985bf723f3fa00697399e0ad7712cc\checkpoints\epoch_0\checkpoint.data-00000-of-00001 : Access is denied  
; Input/output error [Op:MergeV2Checkpoints] 

Is there anyone who works on tensorflow.python.io who could comment on why, sometimes, the ‘overwrite’ flag for ‘rename’ from that subpackage does nothing? It would be cleaner to fix the underlying code than to proceed with my workaround.