spleeter: [Bug] Possibile memory leak in save

I have found a possibile memory leak in the ffmpeg adapter in spleeter_utils/audio/ffmpeg.py, when calling to save:

process = subprocess.Popen(
            command,
            stdout=open(os.devnull, 'wb'),
            stdin=subprocess.PIPE,
            stderr=subprocess.PIPE)
        
        # Write data to STDIN.
        try:
            process.stdin.write(data.astype('<f4').tostring())
        except IOError:
            raise IOError(f'FFMPEG error: {process.stderr.read()}')
        
        # Clean process.
        process.stdin.close()
        if process.stderr is not None:
            process.stderr.close()
        process.wait()

        get_logger().info('File %s written', path)
        
        ################################################################
        current_mem, peak_mem = tracemalloc.get_traced_memory()
        overhead = tracemalloc.get_tracemalloc_memory()
        summary = "traced memory: %d KiB  peak: %d KiB  overhead: %d KiB" % (
            int(current_mem // 1024), int(peak_mem // 1024), int(overhead // 1024)
        )
        print( "after save", summary )
        ################################################################

After consecutive calls the memory will not be de-allocated, and it consists of the whole block of the input file (~30MB each time):

before save traced memory: 31558 KiB  peak: 32248 KiB  overhead: 30933 KiB
after save traced memory: 31560 KiB  peak: 32946 KiB  overhead: 30935 KiB
before save traced memory: 63630 KiB  peak: 64324 KiB  overhead: 58610 KiB
after save traced memory: 63632 KiB  peak: 65018 KiB  overhead: 58611 KiB

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 4
  • Comments: 15

Most upvoted comments

I’m still facing this memory leak issue. Also, the workaround results in an error Do not use tf.reset_default_graph() to clear nested graphs. If you need a cleared graph, exit the nesting and create a new graph. tensorflow==2.3 spleeter==2.2.2

@mmoussallam can you help me out with this?

This bug should be fixed now that we’ve moved on to TF 2

@loretoparisi Amazing! Cheers for the fast answer. I’ll give this a shot.

In the last version of the source code I did it at the end of the api separate_to_file of class Separator:

def separate_to_file(
            self, audio_descriptor, destination,
            audio_adapter=get_default_audio_adapter(),
            offset=0, duration=600., codec='wav', bitrate='128k',
            filename_format='{filename}/{instrument}.{codec}',
            synchronous=True):
        """ Performs source separation and export result to file using
        given audio adapter.

        Filename format should be a Python formattable string that could use
        following parameters : {instrument}, {filename} and {codec}.

        :param audio_descriptor:    Describe song to separate, used by audio
                                    adapter to retrieve and load audio data,
                                    in case of file based audio adapter, such
                                    descriptor would be a file path.
        :param destination:         Target directory to write output to.
        :param audio_adapter:       (Optional) Audio adapter to use for I/O.
        :param offset:              (Optional) Offset of loaded song.
        :param duration:            (Optional) Duration of loaded song.
        :param codec:               (Optional) Export codec.
        :param bitrate:             (Optional) Export bitrate.
        :param filename_format:     (Optional) Filename format.
        :param synchronous:         (Optional) True is should by synchronous.
        """
        waveform, _ = audio_adapter.load(
            audio_descriptor,
            offset=offset,
            duration=duration,
            sample_rate=self._sample_rate)

        with self.tf_session.as_default():
            with self.tf_session.graph.as_default():
                sources = self.separate(waveform)

        filename = splitext(basename(audio_descriptor))[0]
        generated = []

        for instrument, data in sources.items():

            if instrument == 'vocals':
                path = join(destination, filename_format.format(
                    filename=filename,
                    instrument=instrument,
                    codec=codec))
                
                audio_adapter.save(path, data, self._sample_rate, codec, bitrate)

         # clean up things
        tf.reset_default_graph()
        tf.keras.backend.clear_session()

thanks a lot for the investigation. Since it’s deep into the internals of tensorflow I guess we’ll just wait for a fix on their side and bump the version.

I am having problems with TensorFlow and flask. I using billiard instead of multiprocessing I get this error Allocation of 572063744 exceeds 10% of system memory