fst: Thread panics on read operations of FST set file
I’m seeing a very small percentage of “corrupt” FST set files that are triggering panics in Rust (leading to a Python interpreter segfault). The errors look like:
thread '<unnamed>' panicked at 'index out of bounds: the len is 17498006 but the index is 15336395951936096993', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/fst-0.3.0/src/raw/node.rs:306:17
thread '<unnamed>' panicked at 'index out of bounds: the len is 89225255 but the index is 15119944950614189002', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/fst-0.3.0/src/raw/node.rs:306:17
thread '<unnamed>' panicked at 'index out of bounds: the len is 16285338 but the index is 3532794445444415790', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/fst-0.3.0/src/raw/node.rs:306:17
This occurs on approximately 13 out of 4635 files, ranging in size from 20MB to > 100MB. I have not been able to narrow things down past this, but wanted to know what might cause this?
I’m shelling out to the fst-bin crate to combine multiple input files into larger files, then doing set operations on the merged output. The fst binary was built on Rust nightly; I’m not sure of the exact version at the moment.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 22 (8 by maintainers)
@davidblewett Naively, a checksum would be written at the end of the FST, and it would correspond to a crc32c sum of all previous bytes in the FST. If they don’t line up, then you have pretty high confidence that the FST has been corrupted somehow. The checksum would not however help you if the panics you’re seeing are a result of a bug in the FST builder itself.