gosseract: Installation Failure on Windows 7

Summary

Installation Failure on Windows 7 λ go get -t github.com/otiai10/gosseract

github.com/otiai10/gosseract

tessbridge.cpp:5:10: fatal error: tesseract/baseapi.h: No such file or directory #include <tesseract/baseapi.h> ^~~~~~~~~~~~~~~~~~~~~ compilation terminated.

Reproducibility

Yes

Reproducility Frequency

  • 100%

Reproducible Dockerfile

FROM your-os:your-version
# Describe how to reproduce your problem
# on your environment

Otherwise, describe how to reproduce

  1. Install GO lan
  2. Install GCC (64 Bit Compiler)
  3. Install GIT
  4. Install Tesseract from this site https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.02-20180621.exe
  5. Execute λ go get -t github.com/otiai10/gosseract

github.com/otiai10/gosseract

tessbridge.cpp:5:10: fatal error: tesseract/baseapi.h: No such file or directory #include <tesseract/baseapi.h> ^~~~~~~~~~~~~~~~~~~~~ compilation terminated.

Environment

Windows 7

uname -a
go env

C:\Users\33133 λ go env set GOARCH=amd64 set GOBIN= set GOCACHE=C:\Users\33133\AppData\Local\go-build set GOEXE=.exe set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOOS=windows set GOPATH=C:\Users\33133\go set GORACE= set GOROOT=C:\Go set GOTMPDIR= set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64 set GCCGO=gccgo set CC=gcc set CXX=g++ set CGO_ENABLED=1 set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\33133\AppData\Local\Temp\go-build238513982=/tmp/go-build -gno-record-gcc-switches

C:\Users\33133 λ

go version

λ go version go version go1.10.3 windows/amd64

tesseract --version

C:\Program Files (x86)\Tesseract-OCR>tesseract --version tesseract 3.05.02 leptonica-1.75.3 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0. 9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0

C:\Program Files (x86)\Tesseract-OCR>

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 27 (10 by maintainers)

Most upvoted comments

Of course, Here are my actions:

  • As moolen said:
  1. Grab the header files from your linux box (/usr/include/tesseract and /usr/include/leptonica) and copy them to this repo to $REPO/includes.
  2. Grab the Tesseract precompiled binaries from: https://github.com/UB-Mannheim/tesseract/wiki download & extract libtesseract-4.dll to $GOPATH/src/github.com/otiai10/gosseract/tesseract & extract liblept-5.dll to $GOPATH/src/github.com/otiai10/gosseract/lept
  3. Add environment variable TESSDATA_PREFIX testdata path
  4. change client.go to include the headers and also tell the linker where the:
// #cgo linux CXXFLAGS: -std=c++0x
// #cgo linux LDFLAGS: -L/usr/local/lib -llept -ltesseract
// #cgo windows CXXFLAGS: -std=c++0x -Iinclude
// #cgo windows LDFLAGS: -Ltesseract -llibtesseract-4 -Llept -lliblept-5
  • Now run go build and copy Tesseract-OCR/*.dll to the same directory as the cmd.exe, don’t run through go run main.go

there is my demo: https://github.com/veryWrong/gosseract-win-demo

I got a first (native) build on windows working 🎉 It’s a simple gosseract.Version() call but it’s using the C interface under the hood.

changes in a nutshell:

  • provide includes/ dir in the gosseract repo
  • build & provide dlls for leptonica & tesseract
  • link against those libs
  • add tesseract dir to $PATH (for remaining shared libs)

I’m not sure where to go from here. I don’t want to expect users to install a full MSVC toolchain to compile leptonica. Also, i don’t want to provide a MSI.

There are tesseract binaries for windows at least. I found some binaries here tdhintz/tesseract4win64 I’ll give them a try.

I’ll ask upstream if they see a value to have a precompiled dll then we might be able to leverage that.

Imgur

@wangsongyan here’s roughly what i did. Please let me know if that helps you.

Prerequisites

Grab the Tesseract precompiled binaries from: https://github.com/UB-Mannheim/tesseract/wiki download & extract it to $GOPATH/src/github.com/otiai10/gosseract/tesseract

Grab leptonica sources from here and build a 64bit dll using cppan. This gives you hints.

Grab the header files from your linux box (/usr/include/tesseract and /usr/include/leptonica) and copy them to this repo to $REPO/includes.

change client.go to include the headers and also tell the linker where the:

// #cgo CXXFLAGS: -std=c++0x -I./include
// #cgo LDFLAGS: -Ltesseract -Lleptonica/build64/bin/Release -lleptonica-1.78.0 -llibtesseract-4

This is how the repo should look once you have everything in place:

$GOPATH/github.com/otiai10/gosseract
├── include
│    ├── tesseract/*.h
│    └── leptonica/*.h
├── tesseract/libtesseract-4.dll
└── leptonica/build64/bin/Release/leptonica-1.78.0.dll

(!) Also, make sure, that $REPO/tesseract/ aswell as $REPO/leptonica/build64/bin/Release is in your $PATH.

Test program

package main

import (
    "log"

    "github.com/otiai10/gosseract"
)

func main() {
    log.Println(gosseract.Version())
}

Test if we are able to compile, link and execute it.

$ go build cmd/main.go
$ ./main.exe
2019/05/07 xxxx v4.0.0.20190314

*edit

pitfalls i ran into:

You shall not rename the provided dll files. You can inspect the dll using objdump -p foo.dll and find out the “original” name.

Specifying the LD_LIBRARY_PATH for mingw does not work as expected (on my machine?). Using -L to specify the library search path from client.go does not work.

If anyone has the same problem, you could use below command as a replacement for the library until a fix comes:

cmd := exec.Command(`C:\Program Files\Tesseract-OCR\tesseract.exe`, "image_name", "output")

It saves the result in output.txt file