tesseract: unicharset_extractor segfault
Current Behavior
After I built from the current main with debug symbols (configure --disable-openmp --enable-debug --disable-shared CXXFLAGS="-g -O0 -fsanitize=address,undefined -fstack-protector-strong -ftrapv"), trying to use tesstrain immediately segfaults on the unicharset_extractor step (all-gt is 313k, norm_mode=2, nothing unusual):
#0 0x7faa6352b17e in std::filesystem::__cxx11::path::compare(std::filesystem::__cxx11::path const&) const (/lib/x86_64-linux-gnu/libstdc++.so.6+0x19017e)
#1 0x562c491ddc50 in std::filesystem::__cxx11::operator==(std::filesystem::__cxx11::path const&, std::filesystem::__cxx11::path const&) (/data/ocr-d/ocrd_all/venv38/bin/unicharset_extractor+0x2556c50)
#2 0x562c491dc60d in Main /data/ocr-d/ocrd_all/tesseract/src/training/unicharset_extractor.cpp:74
#3 0x562c491dd09d in main /data/ocr-d/ocrd_all/tesseract/src/training/unicharset_extractor.cpp:120
#4 0x7faa625df6c9 (/lib/x86_64-linux-gnu/libc.so.6+0x276c9)
#5 0x7faa625df784 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x27784)
#6 0x562c491db8c0 in _start (/data/ocr-d/ocrd_all/venv38/bin/unicharset_extractor+0x25548c0)
I compiled with g++ 8.3.0.
Judging by the stack trace, there is some non-interopability with the C++ path library here…
Expected Behavior
The unicharset_extractor to exit normally, producing output.
Suggested Fix
No response
tesseract -v
tesseract 5.3.4
leptonica-1.76.0
libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX512BW
Found AVX512F
Found AVX512VNNI
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 liblz4/1.8.3 libzstd/1.3.8
Found libcurl/7.64.0 NSS/3.42.1 zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) libssh2/1.11.0 nghttp2/1.59.0 librtmp/2.3
Operating System
Debian 11 Bullseye
Other Operating System
No response
uname -a
GNU/Linux x86_64
Compiler
g++ 8.3.0
CPU
Intel Xeon Gold
Virtualization / Containers
VMWare
Other Information
No response
About this issue
- Original URL
- State: closed
- Created 4 months ago
- Comments: 31 (20 by maintainers)
Personally, I don’t think we should care about GCC 8 anymore.
The Linux distros that have GCC 8.x as their default compiler: