tesseract: tesseract 4.00 exit 1 with several different tiffs
I get the below error when passing several different images to tesseract
ObjectCache(0x7f3b07159740)::~ObjectCache(): WARNING! LEAK! object 0x1efe4f0 still has count 1 (id /usr/local/share/tessdata/dan.traineddatalstm-punc-dawg)
ObjectCache(0x7f3b07159740)::~ObjectCache(): WARNING! LEAK! object 0x1ce7610 still has count 1 (id /usr/local/share/tessdata/dan.traineddatalstm-word-dawg)
ObjectCache(0x7f3b07159740)::~ObjectCache(): WARNING! LEAK! object 0x1ef08c0 still has count 1 (id /usr/local/share/tessdata/dan.traineddatalstm-number-dawg)
ObjectCache(0x7f3b07159740)::~ObjectCache(): WARNING! LEAK! object 0x3465870 still has count 1 (id /usr/local/share/tessdata/eng.traineddatalstm-punc-dawg)
ObjectCache(0x7f3b07159740)::~ObjectCache(): WARNING! LEAK! object 0x298c100 still has count 1 (id /usr/local/share/tessdata/eng.traineddatalstm-word-dawg)
ObjectCache(0x7f3b07159740)::~ObjectCache(): WARNING! LEAK! object 0x1ef0240 still has count 1 (id /usr/local/share/tessdata/eng.traineddatalstm-number-dawg)
I use a modification of this example from the wiki… It works on most images but some images makes tesseract program return exitcode 1
I use OEM_LSTM_ONLY
tesseract::TessBaseAPI.Version() => 4.0.0-86-gbee8
Here are the images https://imgur.com/a/e5vyzLL
Pix *image = pixRead("/usr/src/tesseract/testing/phototest.tif");
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
api->Init(NULL, "eng");
api->SetImage(image);
api->Recognize(0);
tesseract::ResultIterator* ri = api->GetIterator();
tesseract::PageIteratorLevel level = tesseract::RIL_WORD;
if (ri != 0) {
do {
const char* word = ri->GetUTF8Text(level);
float conf = ri->Confidence(level);
int x1, y1, x2, y2;
ri->BoundingBox(level, &x1, &y1, &x2, &y2);
printf("word: '%s'; \tconf: %.2f; BoundingBox: %d,%d,%d,%d;\n",
word, conf, x1, y1, x2, y2);
delete[] word;
} while (ri->Next(level));
}
install
- Install Leptonica
# apt-get install libleptonica-dev
- Install 4.00 from source
# apt-get install autoconf automake libtool pkg-config
# cd /var/bin && git clone https://github.com/tesseract-ocr/tesseract.git tesseract-4.00
# cd tesseract-4.00 && ./autogen.sh && ./configure && make -j $(nproc) && make install && ldconfig
- Install language fastest (4.00 - only works with OEM_LSTM_ONLY)
# cd /usr/local/share/tessdata && wget https://github.com/tesseract-ocr/tessdata_fast/raw/master/eng.traineddata && wget https://github.com/tesseract-ocr/tessdata_fast/raw/master/dan.traineddata
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 31 (6 by maintainers)
This is not bug (at least not proved). Solution is to use correctly API (which you did not). If you do not how, asks at right place - use forum.
I don’t see how someone using the API can handle the dawgs caching like tesseract does.
@stweil, can you say what you think about this issue?