tesseract: tesseract 4.00 exit 1 with several different tiffs

I get the below error when passing several different images to tesseract

ObjectCache(0x7f3b07159740)::~ObjectCache(): WARNING! LEAK! object 0x1efe4f0 still has count 1 (id /usr/local/share/tessdata/dan.traineddatalstm-punc-dawg)
ObjectCache(0x7f3b07159740)::~ObjectCache(): WARNING! LEAK! object 0x1ce7610 still has count 1 (id /usr/local/share/tessdata/dan.traineddatalstm-word-dawg)
ObjectCache(0x7f3b07159740)::~ObjectCache(): WARNING! LEAK! object 0x1ef08c0 still has count 1 (id /usr/local/share/tessdata/dan.traineddatalstm-number-dawg)
ObjectCache(0x7f3b07159740)::~ObjectCache(): WARNING! LEAK! object 0x3465870 still has count 1 (id /usr/local/share/tessdata/eng.traineddatalstm-punc-dawg)
ObjectCache(0x7f3b07159740)::~ObjectCache(): WARNING! LEAK! object 0x298c100 still has count 1 (id /usr/local/share/tessdata/eng.traineddatalstm-word-dawg)
ObjectCache(0x7f3b07159740)::~ObjectCache(): WARNING! LEAK! object 0x1ef0240 still has count 1 (id /usr/local/share/tessdata/eng.traineddatalstm-number-dawg)

I use a modification of this example from the wiki… It works on most images but some images makes tesseract program return exitcode 1

I use OEM_LSTM_ONLY

tesseract::TessBaseAPI.Version() => 4.0.0-86-gbee8

Here are the images https://imgur.com/a/e5vyzLL

  Pix *image = pixRead("/usr/src/tesseract/testing/phototest.tif");
  tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
  api->Init(NULL, "eng");
  api->SetImage(image);
  api->Recognize(0);
  tesseract::ResultIterator* ri = api->GetIterator();
  tesseract::PageIteratorLevel level = tesseract::RIL_WORD;
  if (ri != 0) {
    do {
      const char* word = ri->GetUTF8Text(level);
      float conf = ri->Confidence(level);
      int x1, y1, x2, y2;
      ri->BoundingBox(level, &x1, &y1, &x2, &y2);
      printf("word: '%s';  \tconf: %.2f; BoundingBox: %d,%d,%d,%d;\n",
               word, conf, x1, y1, x2, y2);
      delete[] word;
    } while (ri->Next(level));
  }

install

- Install Leptonica
# apt-get install libleptonica-dev

- Install 4.00 from source
# apt-get install autoconf automake libtool pkg-config
# cd /var/bin && git clone https://github.com/tesseract-ocr/tesseract.git tesseract-4.00
# cd tesseract-4.00 && ./autogen.sh && ./configure && make -j $(nproc) && make install && ldconfig

- Install language fastest (4.00 - only works with OEM_LSTM_ONLY)
# cd /usr/local/share/tessdata && wget https://github.com/tesseract-ocr/tessdata_fast/raw/master/eng.traineddata && wget https://github.com/tesseract-ocr/tessdata_fast/raw/master/dan.traineddata

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 31 (6 by maintainers)

Most upvoted comments

This is not bug (at least not proved). Solution is to use correctly API (which you did not). If you do not how, asks at right place - use forum.

I don’t see how someone using the API can handle the dawgs caching like tesseract does.

@stweil, can you say what you think about this issue?