tesseract: libarchive dependency wrong?

If I build Tesseract with autotools, it drags in libarchive (libarchive13 in my case), for the shared library as well as the CLI – in contrast to the prebuilt version.

But this fails for tesstrain-generated models:

archive_read_open_filename(...,/usr/local/share/tessdata/GT4HistOCR_2000000.traineddata,...) failed, Invalid or incomplete multibyte or wide character

So I tried to remove that dependency when calling configure, but none of the documented methods work, it always ends up giving me checking for libarchive... yes. So far, I have tried:

  • ./configure --without-libarchive ...
  • ./configure --disable-libarchive ...
  • libarchive_CFLAGS= libarchive_LIBS= ./configure ...

What is the reason for the above failure, and is there any known workaround at build time?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 26 (14 by maintainers)

Most upvoted comments

IMO, since we don’t have an official tessdata repo with the new format, by default tesseract should not link with libarchive (even when it is found on the system).

There should be a ‘–with-libarchive’ option and ‘with-libcurl’.