tesseract: LSTM: User patterns do not work
ref: https://groups.google.com/forum/#!msg/tesseract-ocr/S9CIK3jOMWw/vVBZULrJ9xcJ
I tried using bazaar config for user patterns suggested in above post ( \A\A\d\d\d\A\A ) with the latest windows binary. It does not seem to work. Does the functionality work on linux?
input, output and config files attached. I added.txt extension to bazaar and eng.user-patterns in order to upload it here.

OUTPUT
0011917
OX345PT
PT7895M
BA409QT
OMOOKM
WE4321M
OOLI9T7
OX345PT
PT789SM
BA409QT
OMOOKMI
WE432LM
OOLI9T7
OX345PT
PT7898M
BA409QT
OMOOKMI
WE432LM
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 8
- Comments: 16 (5 by maintainers)
I can tell you that in the Tesseract forum many users ask about these files. They are disappointed that there is no effect on accuracy when using them with their input.
The input is usually not a document but something like receipt, passport, car license plate, with a small set of known words/patterns.
Any updates?
In addition to the cases mentioned by Amit, there are users who would like to use the user_words dictionary in addition to Tesseract’s wordlist,
some examples of user words could be client names, industry specific terminology eg. Medical or pharmaceutical.
Is it possible to allow for both kinds of scenarios, based on some config/variable?