spleeter: [Bug] Restoring from checkpoint failed when synchronously separating to file within a loop

Description

Inside of a for loop, I create a new Separator instance and then invoke separate_to_file. After the first iteration, tensorflow errors out when trying to restore from checkpoint, which I take to mean the contents of the pretrained_models stem type I chose. (Bold styling added by me for emphasis)

Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key batch_normalization_24/beta not found in checkpoint [[node save_1/RestoreV2 (defined at […]/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Step to reproduce

  1. Create virtual environment
  2. Enter virtual environment
  3. Run pip3 install spleeter
  4. Save script as spleeter_loop_error.py:
# spleeter_loop_error.py
import argparse
from pathlib import Path
from shutil import rmtree
from spleeter.separator import Separator

def debug_print(message):
  print('\n----')
  print(f'\t{message}')
  print('----\n')


# Create temporary directory in the directory that the script is saved to
def init_outdir():
  outdir = Path(__file__).parent.absolute() / '__example_temp'
  if outdir.is_dir():
    rmtree(outdir)

  outdir.mkdir()
  return outdir


if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument('-f', '--filename', type=str, required=True)
  parser.add_argument('-o', '--outdir', help='where spleeter will save output files', type=str, required=False)
  args = parser.parse_args()

  outdir = ''
  if args.outdir:
    outdir = args.outdir
  else:
    outdir = init_outdir()

  for i in range(2):
    separator = Separator('spleeter:2stems', stft_backend='librosa') 
    separator.separate_to_file(args.filename, outdir / str(i), filename_format='{instrument}.{codec}')
    debug_print(f'Successfully split ({i+1})')
  1. Run python3 spleeter_loop_error.py -f [path to audio file]
  2. Error occurs during second loop

Output

([…] replaces path from root to virtual environment)

WARNING:tensorflow:From [...]/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From [...]/lib/python3.7/site-packages/spleeter/separator.py:144: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From [...]/lib/python3.7/site-packages/spleeter/separator.py:146: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-04-01 16:47:40.900998: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-04-01 16:47:40.901023: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2020-04-01 16:47:40.901042: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (thatman-Z68A-D3H-B3): /proc/driver/nvidia/version does not exist
2020-04-01 16:47:40.925211: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3309580000 Hz
2020-04-01 16:47:40.925541: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x63692e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-01 16:47:40.925577: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
INFO:spleeter:File [...]/__example_temp/0/vocals.wav written succesfully
INFO:spleeter:File [...]/__example_temp/0/accompaniment.wav written succesfully

----
	Successfully split (1)
----

2020-04-01 16:48:00.224581: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key batch_normalization_24/beta not found in checkpoint
Traceback (most recent call last):
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key batch_normalization_24/beta not found in checkpoint
	 [[{{node save_1/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1290, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key batch_normalization_24/beta not found in checkpoint
	 [[node save_1/RestoreV2 (defined at [...]/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save_1/RestoreV2':
  File "bare_loopy_spleeter.py", line 38, in <module>
    separator.separate_to_file(args.filename, outdir / str(i), filename_format='{instrument}.{codec}')
  File "[...]/lib/python3.7/site-packages/spleeter/separator.py", line 191, in separate_to_file
    sources = self.separate(waveform, audio_descriptor)
  File "[...]/lib/python3.7/site-packages/spleeter/separator.py", line 157, in separate
    return self.separate_librosa(waveform, audio_descriptor)
  File "[...]/lib/python3.7/site-packages/spleeter/separator.py", line 144, in separate_librosa
    saver = tf.train.Saver()
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1300, in restore
    names_to_keys = object_graph_key_mapping(save_path)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1618, in object_graph_key_mapping
    object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 915, in get_tensor
    return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bare_loopy_spleeter.py", line 38, in <module>
    separator.separate_to_file(args.filename, outdir / str(i), filename_format='{instrument}.{codec}')
  File "[...]/lib/python3.7/site-packages/spleeter/separator.py", line 191, in separate_to_file
    sources = self.separate(waveform, audio_descriptor)
  File "[...]/lib/python3.7/site-packages/spleeter/separator.py", line 157, in separate
    return self.separate_librosa(waveform, audio_descriptor)
  File "[...]/lib/python3.7/site-packages/spleeter/separator.py", line 147, in separate_librosa
    saver.restore(sess, latest_checkpoint)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1306, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key batch_normalization_24/beta not found in checkpoint
	 [[node save_1/RestoreV2 (defined at [...]/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save_1/RestoreV2':
  File "bare_loopy_spleeter.py", line 38, in <module>
    separator.separate_to_file(args.filename, outdir / str(i), filename_format='{instrument}.{codec}')
  File "[...]/lib/python3.7/site-packages/spleeter/separator.py", line 191, in separate_to_file
    sources = self.separate(waveform, audio_descriptor)
  File "[...]/lib/python3.7/site-packages/spleeter/separator.py", line 157, in separate
    return self.separate_librosa(waveform, audio_descriptor)
  File "[...]/lib/python3.7/site-packages/spleeter/separator.py", line 144, in separate_librosa
    saver = tf.train.Saver()
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "[...]/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Environment

OS Ubuntu 19.10 (eoan)
Installation type pip 20.0.2, python 3.7.5
RAM available 13G
Hardware spec Intel i5-2500, Radeon HD 6850

Additional context

How I discovered this issue:

In my actual usecase, I am invoking Separator.separate_to_file within a python based server. It works fine within my development environment. However, when I changed to an asynchronous server, spleeter errors out in very much the same way, only the key that tensorflow expects is slightly different. (Bold styling again added for emphasis)

  1. I start the server.
  2. Hitting the end point the first time works as expected.
  3. After hitting the same endpoint the second time,

INFO: Apply unet for vocals_spectrogram INFO: Apply unet for drums_spectrogram INFO: Apply unet for bass_spectrogram INFO: Apply unet for other_spectrogram INFO: Restoring parameters from pretrained_models/4stems/model 2020-04-01 17:13:17.556825: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key batch_normalization_48/beta not found in checkpoint

  1. And the third time,

INFO: Apply unet for vocals_spectrogram INFO: Apply unet for drums_spectrogram INFO: Apply unet for bass_spectrogram INFO: Apply unet for other_spectrogram INFO: Restoring parameters from pretrained_models/4stems/model 2020-04-01 17:15:57.301420: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key batch_normalization_100/beta not found in checkpoint

  1. As far as I can tell, every subsequent request expects the same batch_normalization_100/beta key.

It is my hope that if I can resolve the issue described above, I can solve my server-based issue in much the same way. Thank you for making this amazing project available to the public!

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 15

Most upvoted comments