tesseract: terminate called after throwing an instance of 'std::bad_alloc'
Hello,
First thanks for your job. I am trying to run tesseract 4 but I am getting an issue:
Info in bmfCreate: Generating pixa of bitmap fonts from string terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted (core dumped)
Step to reproduce (with a docker file):
FROM ubuntu
RUN apt-get update && apt-get install -y \
autoconf \
automake \
libtool \
autoconf-archive \
pkg-config \
libpng12-dev \
libjpeg8-dev \
libtiff5-dev \
zlib1g-dev \
libicu-dev \
libpango1.0-dev \
libcairo2-dev \
git \
curl && \
rm -rf /var/lib/apt/lists/*
RUN curl http://www.leptonica.org/source/leptonica-1.74.1.tar.gz -o leptonica-1.74.1.tar.gz && \
tar -zxvf leptonica-1.74.1.tar.gz && \
cd leptonica-1.74.1 && ./configure && make && make install && \
cd .. && rm -rf leptonica*
RUN git clone --depth 1 https://github.com/tesseract-ocr/tesseract.git && \
cd tesseract && \
./autogen.sh && \
./configure --enable-debug && \
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make && \
make install && \
ldconfig && \
make training && \
make training-install && \
cd .. && rm -rf tesseract
# Get basic traineddata
RUN curl https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata > eng.traineddata && \
mv eng.traineddata /usr/local/share/tessdata/
RUN curl https://github.com/tesseract-ocr/tessdata/raw/master/fra.traineddata > fra.traineddata && \
mv fra.traineddata /usr/local/share/tessdata/
Then:
docker build -t tesseract4 .
docker run tesseract4
docker run -t -i tesseract4 /bin/bash
mkdir test
cd test
curl http://tleyden-misc.s3.amazonaws.com/blog_images/ocr_test.png > test.png
tesseract test.png out
Can someone explain me what is happening?
For information I have 2471 megabytes of memory remaning
Thanks in advance
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 33 (3 by maintainers)
curl https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata > eng.traineddatadoes not get the expected data file, but gets a HTML redirection file:Use
curl -LO https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata(and similar for other languages), then Tesseract with Docker works for me. With the bad data file, I get an error message: