tensorflow: Issue with retrain.py, "could not convert string to float" when creating bottlenecks
Environment info
Operating System:
Ubuntu 14.04
Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*
):
CUDA 7.5, cuDNN v2 (6.5)
lrwxrwxrwx 1 root root 19 May 9 09:11 /usr/local/cuda -> /usr/local/cuda-7.5
Installed the Nightly pip package from April 12th, with GPU support Tensorflow version: 0.7.1
If installed from sources, provide the commit hash:
Steps to reproduce
- Running the retrain with a very large image directory (around 600k files) causes the training to fail around 3/4 the way through, during the bottleneck creation process. It always seems to fail while within the same label folder, but it does not appear that any file names within that label folder are corrupted or named incorrectly.
What have you tried?
- Changing the number of training steps
- Checking to see if any files are less than 30k (theoretically, very small files are likely to be corrupted jpeg data)
Logs or other output that would be helpful
For 610,000 files, the process can’t seem to get through more than 415k. Am I simply just using too many files? Or am I missing a very, very subtle naming convention issue?
Full stack trace:
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv)) File "retrain_reshape_error.py", line 695, in main cache_bottlenecks(sess, image_lists, FLAGS.image_dir, FLAGS.bottleneck_dir, jpeg_data_tensor, bottleneck_tensor) File "retrain_reshape_error.py", line 400, in cache_bottlenecks image_dir, category, bottleneck_dir, jpeg_data_tensor, bottleneck_tensor) File "retrain_reshape_error.py", line 372, in get_or_create_bottleneck bottleneck_values = [float(x) for x in bottleneck_string.split(',')] ValueErrror: could not convert string to float
I should also note that I have been using this training process for the last few months with absolutely no failure.
Thanks!
Oren
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 17 (9 by maintainers)
Commits related to this issue
- Adding checks for broken bottleneck files An exception is raised when a broken cached bottleneck files is read. Such a file can be created as a result of sudden system failure. This commit refers the... — committed to rizasif/tensorflow by rizasif 7 years ago
- Resolved indentation errors This commit refers to the issue #2296 — committed to rizasif/tensorflow by rizasif 7 years ago
- Indentation issue for broken bottleneck script This commit refers to pull request #7131 and issue #2296 — committed to rizasif/tensorflow by rizasif 7 years ago
- Adding checks for broken bottleneck files (#7131) * Adding checks for broken bottleneck files An exception is raised when a broken cached bottleneck files is read. Such a file can be created as a ... — committed to tensorflow/tensorflow by rizasif 7 years ago
So I placed some checks in the retrain.py file and its working fine now. From What I understand this is not because of a faulty image file but a faulty bottleneck file. As soon as the error was caught by my checks, the script started creating new bottleneck files instead of checking the cached ones.
In other words, if an incomplete bottleneck file exists in the cache then this error arises (this file can be produced because of sudden system shutdown).
Please add some checks to the code. I’ll paste my solution below: