tesseract: LSTM: User patterns do not work

ref: https://groups.google.com/forum/#!msg/tesseract-ocr/S9CIK3jOMWw/vVBZULrJ9xcJ

I tried using bazaar config for user patterns suggested in above post ( \A\A\d\d\d\A\A ) with the latest windows binary. It does not seem to work. Does the functionality work on linux?

input, output and config files attached. I added.txt extension to bazaar and eng.user-patterns in order to upload it here.

patterntest

OUTPUT

0011917
OX345PT
PT7895M
BA409QT
OMOOKM
WE4321M

OOLI9T7
OX345PT
PT789SM
BA409QT
OMOOKMI
WE432LM

OOLI9T7
OX345PT
PT7898M
BA409QT
OMOOKMI
WE432LM

patternbazaar.txt

bazaar.txt eng.user-patterns.txt

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 8
Comments: 16 (5 by maintainers)

Most upvoted comments

I can tell you that in the Tesseract forum many users ask about these files. They are disappointed that there is no effect on accuracy when using them with their input.

The input is usually not a document but something like receipt, passport, car license plate, with a small set of known words/patterns.

+11

amitdo on Dec 7, 2016

Any updates?

galharth on Dec 30, 2017

In addition to the cases mentioned by Amit, there are users who would like to use the user_words dictionary in addition to Tesseract’s wordlist,

some examples of user words could be client names, industry specific terminology eg. Medical or pharmaceutical.

Is it possible to allow for both kinds of scenarios, based on some config/variable?

Shreeshrii on Dec 8, 2016