tensorflow: Issue with retrain.py, "could not convert string to float" when creating bottlenecks

Environment info

Operating System: Ubuntu 14.04 Installed version of CUDA and cuDNN: (please attach the output of ls -l /path/to/cuda/lib/libcud*): CUDA 7.5, cuDNN v2 (6.5) lrwxrwxrwx 1 root root 19 May 9 09:11 /usr/local/cuda -> /usr/local/cuda-7.5

Installed the Nightly pip package from April 12th, with GPU support Tensorflow version: 0.7.1

If installed from sources, provide the commit hash:

Steps to reproduce

  1. Running the retrain with a very large image directory (around 600k files) causes the training to fail around 3/4 the way through, during the bottleneck creation process. It always seems to fail while within the same label folder, but it does not appear that any file names within that label folder are corrupted or named incorrectly.

What have you tried?

  1. Changing the number of training steps
  2. Checking to see if any files are less than 30k (theoretically, very small files are likely to be corrupted jpeg data)

Logs or other output that would be helpful

For 610,000 files, the process can’t seem to get through more than 415k. Am I simply just using too many files? Or am I missing a very, very subtle naming convention issue?

Full stack trace:

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv)) File "retrain_reshape_error.py", line 695, in main cache_bottlenecks(sess, image_lists, FLAGS.image_dir, FLAGS.bottleneck_dir, jpeg_data_tensor, bottleneck_tensor) File "retrain_reshape_error.py", line 400, in cache_bottlenecks image_dir, category, bottleneck_dir, jpeg_data_tensor, bottleneck_tensor) File "retrain_reshape_error.py", line 372, in get_or_create_bottleneck bottleneck_values = [float(x) for x in bottleneck_string.split(',')] ValueErrror: could not convert string to float

I should also note that I have been using this training process for the last few months with absolutely no failure.

Thanks!

Oren

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 17 (9 by maintainers)

Commits related to this issue

Most upvoted comments

So I placed some checks in the retrain.py file and its working fine now. From What I understand this is not because of a faulty image file but a faulty bottleneck file. As soon as the error was caught by my checks, the script started creating new bottleneck files instead of checking the cached ones.

In other words, if an incomplete bottleneck file exists in the cache then this error arises (this file can be produced because of sudden system shutdown).

Please add some checks to the code. I’ll paste my solution below:

#retrain.py
...
def get_or_create_bottleneck(sess, image_lists, label_name, index, image_dir,
                             category, bottleneck_dir, jpeg_data_tensor,
                             bottleneck_tensor):
...
  with open(bottleneck_path, 'r') as bottleneck_file:
    bottleneck_string = bottleneck_file.read()
  
  #avoiding non float values
    try:
      bottleneck_values = [float(x) for x in bottleneck_string.split(',')]
    except:
      print("Invalid float found, sending None instead")
      return None

  return bottleneck_values

...
def cache_bottlenecks(sess, image_lists, image_dir, bottleneck_dir,
                      jpeg_data_tensor, bottleneck_tensor):
...
      for index, unused_base_name in enumerate(category_list):
        bNeck = get_or_create_bottleneck(sess, image_lists, label_name, index,
                                 image_dir, category, bottleneck_dir,
                                 jpeg_data_tensor, bottleneck_tensor)
        if bNeck is not None:
          how_many_bottlenecks += 1
          if how_many_bottlenecks % 100 == 0:
            print(str(how_many_bottlenecks) + ' bottleneck files created.')

...
def get_random_cached_bottlenecks(sess, image_lists, how_many, category,
                                  bottleneck_dir, image_dir, jpeg_data_tensor,
                                  bottleneck_tensor):
...
  for unused_i in range(how_many):
    label_index = random.randrange(class_count)
    label_name = list(image_lists.keys())[label_index]
    image_index = random.randrange(MAX_NUM_IMAGES_PER_CLASS + 1)
    bottleneck = get_or_create_bottleneck(sess, image_lists, label_name,
                                          image_index, image_dir, category,
                                          bottleneck_dir, jpeg_data_tensor,
                                          bottleneck_tensor)
    if bottleneck is not None:
      ground_truth = np.zeros(class_count, dtype=np.float32)
      ground_truth[label_index] = 1.0
      bottlenecks.append(bottleneck)
      ground_truths.append(ground_truth)
  return bottlenecks, ground_truths

...
def get_random_distorted_bottlenecks(
    sess, image_lists, how_many, category, image_dir, input_jpeg_tensor,
    distorted_image, resized_input_tensor, bottleneck_tensor):
...
    bottleneck = run_bottleneck_on_image(sess, distorted_image_data,
                                         resized_input_tensor,
                                         bottleneck_tensor)
    if bottleneck is not None:
      ground_truth = np.zeros(class_count, dtype=np.float32)
      ground_truth[label_index] = 1.0
      bottlenecks.append(bottleneck)
      ground_truths.append(ground_truth)
  return bottlenecks, ground_truths