audio: Tracking Issue for Flaky File-like obj CI failure

Some tests for file-like object support for load function are failing sporadicly. Retrying the job makes the failure go away, but we need to understand what is causing the flaky failure.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16

Commits related to this issue

Most upvoted comments

Resolved by #1297

Okay, I think I figured. The is_seekable function is checking the stat.st_mode of an opened file, which is invalid in case of memory object.

https://github.com/dmkrepo/libsox/blob/b9dd1a86e71bbd62221904e3e59dfaa9e5e72046/src/formats.c#L336

Checking the data from failure mode 2; the source audio file and audio files processed with sox have the same checksum. This indicates that there is some un-deterministic behavior in torchaudio’s implementation.

Success - torchaudio_unittest.backend.sox_io.save_test.SaveTest.test_save_flac_bytesio_16_0.tar.gz

Failure - torchaudio_unittest.backend.sox_io.save_test.SaveTest.test_save_flac_bytesio_16_0.tar.gz

Checksum of generated files.

Success

$ sha256sum torchaudio_unittest.backend.sox_io.save_test.SaveTest.test_save_flac_bytesio_16_0/*
1a43dd5202839020aa2bc8dbdf10bd5d4038359221f0bfd5a310233fa18f74a9  1.source.wav
94d012cd58ab008f214af795b95f36688a1fa6263df2bbfc8176f04f36280bf9  2.1.torchaudio.flac
7dee548df31d8e8f53d208568e5f559f6641ea8598bf464b5c1e0c23dd714462  2.2.result.wav
7471e954e0fd44a710b8ed75a42370ba2b709d63b47eb30789db13d02fc85033  3.1.sox.flac
7dee548df31d8e8f53d208568e5f559f6641ea8598bf464b5c1e0c23dd714462  3.2.ref.wav

Failure

$ sha256sum torchaudio_unittest.backend.sox_io.save_test.SaveTest.test_save_flac_bytesio_16_0/*
1a43dd5202839020aa2bc8dbdf10bd5d4038359221f0bfd5a310233fa18f74a9  1.source.wav
efa33771a51c6ccbb41615c2729335aa8f5a0782dc898e6599305493e2d5dd27  2.1.torchaudio.flac
d63b694fea46b766676589b1fb8ce720b495c22e3282dae0e830e5d9e6fe86ae  2.2.result.wav
7471e954e0fd44a710b8ed75a42370ba2b709d63b47eb30789db13d02fc85033  3.1.sox.flac
7dee548df31d8e8f53d208568e5f559f6641ea8598bf464b5c1e0c23dd714462  3.2.ref.wav

Where are they different?

Truncating the first 10K (12K total) gives the same check sum. This suggests that the un-deterministic behavior occurs at the end where the file is closed.

Success

$ truncate -s 10K  2.1.torchaudio.flac
$ sha256sum 2.1.torchaudio.flac
59cdc96b07b36031aa50ffed02ed2c5de1e8024c5e7d13015265d1569923ba1d  2.1.torchaudio.flac

Failure

$ truncate -s 10K  2.1.torchaudio.flac
$ sha256sum  2.1.torchaudio.flac
59cdc96b07b36031aa50ffed02ed2c5de1e8024c5e7d13015265d1569923ba1d  2.1.torchaudio.flac

How are they written?

Tapping into the number of bytes written each time write() is called, there is a slight difference in success case and failure case.

debug wrapper
class DebuggWrapper:
    def __init__(self, fileobj):
        self.fileobj = fileobj
        self.req_count = 0
        self.count = 0

    def read(self, size: int) -> bytes:
        bytes_ = self.fileobj.read(size)
        self.req_count += size
        self.count += len(bytes_)
        print(f'Read: {self.count} / {self.req_count}', file=sys.stderr)
        return bytes_

    def write(self, data: bytes):
        self.count += len(data)
        print(f'Write: {self.count}', file=sys.stderr)
        self.fileobj.write(data)

Success

Write: 1734
Write: 3899
Write: 5525
Write: 7690
Write: 9316
Write: 10940
Write: 11506

Failure

Write: 1734
Write: 3899
Write: 5525
Write: 7690
Write: 9316
Write: 10940
Write: 11048

Potentially problematic code?

The final write is called at

https://github.com/pytorch/audio/blob/3488f3142c734326b54badfba4b166173647d1b2/torchaudio/csrc/sox/io.cpp#L299-L303

The problem here is that, to get the final byte chunk, we need to close the sox_format_t structure. But doing so for the case of in-memory file object, we will lose the way to see how many bytes are written to the buffer. In the above code we use buffer.size, hoping that under the hood, the new buffer size is same as the number of data written, which could be wrong…

But where is the un-deterministic behavior coming from?

The above investigation still does not explain where is the origin. Is it open_memstream used by libsox? libsox implementation or flac encoder?

Checking at the data from the failure mode 2;

torchaudio_unittest.backend.sox_io.save_test.SaveTest.test_save_flac_bytesio_16_0.tar.gz

looks like ~the conversion from flac to wav is altering the data.~ flac data has something wrong.

$ soxi *

Input File     : '1.source.wav'
Channels       : 2
Sample Rate    : 8000
Precision      : 32-bit
Duration       : 00:00:03.00 = 24000 samples ~ 225 CDDA sectors
File Size      : 192k
Bit Rate       : 512k
Sample Encoding: 32-bit Signed Integer PCM


Input File     : '2.1.torchaudio.flac'
Channels       : 2
Sample Rate    : 8000
Precision      : 16-bit
Duration       : 00:00:03.00 = 24000 samples ~ 225 CDDA sectors
File Size      : 11.0k
Bit Rate       : 29.5k
Sample Encoding: 16-bit FLAC


Input File     : '2.2.result.wav'
Channels       : 2
Sample Rate    : 8000
Precision      : 25-bit
Duration       : 00:00:02.88 = 23040 samples ~ 216 CDDA sectors
File Size      : 184k
Bit Rate       : 512k
Sample Encoding: 32-bit Floating Point PCM


Input File     : '3.1.sox.flac'
Channels       : 2
Sample Rate    : 8000
Precision      : 16-bit
Duration       : 00:00:03.00 = 24000 samples ~ 225 CDDA sectors
File Size      : 11.5k
Bit Rate       : 30.8k
Sample Encoding: 16-bit FLAC
Comment        : 'Comment=Processed by SoX'


Input File     : '3.2.ref.wav'
Channels       : 2
Sample Rate    : 8000
Precision      : 25-bit
Duration       : 00:00:03.00 = 24000 samples ~ 225 CDDA sectors
File Size      : 192k
Bit Rate       : 512k
Sample Encoding: 32-bit Floating Point PCM

Total Duration of 5 files: 00:00:14.88

Audacity also reports that the final part of data is missing.

Screen Shot 2021-02-18 at 20 49 25