kraken: Unable to segment specific images
This problem occurs with 11 out of a set of ~646 PNGs, all of which plopped out of the exact same processing pipeline, scanned on exactly the same hardware.
Both models (seg & rec) trained from binary_datasets branch about a week ago.
$ pip show scikit-image
Name: scikit-image
Version: 0.17.2
$ kraken --version # master branch
kraken, version 3.0.7
$ kraken -d cuda:0 -i vol02_page0002_f002.png vol02_page0002_f002.xml -a segment -bl -i ~/.../seg_best.mlmodel ocr -m ~/.../rec_best.mlmodel
[0.0086] Baseline model (~/.../seg_best.mlmodel) given but legacy segmenter selected. Forcing to -bl.
WARNING:root:Torch version 1.10.1+cu102 has not been tested with coremltools. You may run into unexpected errors. Torch 1.9.1 is the most recent version that has been tested.
Loading ANN ~/.../seg_best.mlmodel ✓
Loading ANN default ✓
Segmenting [19.1791] Polygonizer failed on line 0: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)
[19.2210] Polygonizer failed on line 0: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)
✓
Processing [####################################] 100%
Writing recognition results for vol02_page0002_f002.png TopologyException: side location conflict at 2232 4431.6923076923076
[38.5192] Failed processing vol02_page0002_f002.png: No Shapely geometry can be created from null value
Traceback (most recent call last):
File "/home/escriptorium/escriptorium/env/bin/kraken", line 8, in <module>
sys.exit(cli())
File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/click/core.py", line 1691, in invoke
return _process_result(rv)
File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/click/core.py", line 1628, in _process_result
value = ctx.invoke(self._result_callback, value, **ctx.params)
File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/kraken/kraken.py", line 380, in process_pipeline
task(input=input, output=output)
File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/kraken/kraken.py", line 252, in recognizer
ctx.meta['output_mode']))
File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/kraken/serialization.py", line 204, in serialize
pols = unary_union(pols)
File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/shapely/ops.py", line 161, in unary_union
return geom_factory(lgeos.methods['unary_union'](collection))
File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/shapely/geometry/base.py", line 73, in geom_factory
raise ValueError("No Shapely geometry can be created from null value")
ValueError: No Shapely geometry can be created from null value
But others :
$ kraken -d cuda:0 -i vol02_page0212_f001.png vol02_page0212_f001.xml -a segment -bl -i ~/.../seg_best.mlmodel ocr -m ~/.../rec_best.mlmodel
[0.0123] Baseline model (~/.../seg_best.mlmodel) given but legacy segmenter selected. Forcing to -bl.
WARNING:root:Torch version 1.10.1+cu102 has not been tested with coremltools. You may run into unexpected errors. Torch 1.9.1 is the most recent version that has been tested.
Loading ANN ~/.../seg_best.mlmodel ✓
Loading ANN default ✓
Segmenting ✓
Processing [####################################] 100%
Writing recognition results for vol02_page0212_f001.png ✓
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 27 (26 by maintainers)
Commits related to this issue
- update dependencies * bump up most requirements up to latest releases * pin shapely to a 1.7.x release as 1.8 causes crashes in the serializer (#319) — committed to mittagessen/kraken by mittagessen 2 years ago
I can reproduce your bug with 1.8.2 and another unrelated TopologyException for 1.8.4. Awesome.
Urrrgh another shapely/GEOM bug. I’ll look into it. In fact the code just above your instrumentation is there to circumvent
TopologyExceptions caused by corner cases in theunary_unionfunction. Apparently, we haven’t caught them all yet. Why can’t geometry be simple?