kraken: Unable to segment specific images

This problem occurs with 11 out of a set of ~646 PNGs, all of which plopped out of the exact same processing pipeline, scanned on exactly the same hardware.

Both models (seg & rec) trained from binary_datasets branch about a week ago.

$ pip show scikit-image
Name: scikit-image
Version: 0.17.2

$ kraken --version # master branch
kraken, version 3.0.7

$ kraken -d cuda:0 -i vol02_page0002_f002.png vol02_page0002_f002.xml -a segment -bl -i ~/.../seg_best.mlmodel ocr -m ~/.../rec_best.mlmodel
[0.0086] Baseline model (~/.../seg_best.mlmodel) given but legacy segmenter selected. Forcing to -bl. 
WARNING:root:Torch version 1.10.1+cu102 has not been tested with coremltools. You may run into unexpected errors. Torch 1.9.1 is the most recent version that has been tested.
Loading ANN ~/.../seg_best.mlmodel     ✓
Loading ANN default     ✓
Segmenting      [19.1791] Polygonizer failed on line 0: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s) 
[19.2210] Polygonizer failed on line 0: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s) 
✓
Processing  [####################################]  100%          
Writing recognition results for vol02_page0002_f002.png TopologyException: side location conflict at 2232 4431.6923076923076
[38.5192] Failed processing vol02_page0002_f002.png: No Shapely geometry can be created from null value 
Traceback (most recent call last):
  File "/home/escriptorium/escriptorium/env/bin/kraken", line 8, in <module>
    sys.exit(cli())
  File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/click/core.py", line 1691, in invoke
    return _process_result(rv)
  File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/click/core.py", line 1628, in _process_result
    value = ctx.invoke(self._result_callback, value, **ctx.params)
  File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/kraken/kraken.py", line 380, in process_pipeline
    task(input=input, output=output)
  File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/kraken/kraken.py", line 252, in recognizer
    ctx.meta['output_mode']))
  File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/kraken/serialization.py", line 204, in serialize
    pols = unary_union(pols)
  File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/shapely/ops.py", line 161, in unary_union
    return geom_factory(lgeos.methods['unary_union'](collection))
  File "/home/escriptorium/escriptorium/env/lib/python3.7/site-packages/shapely/geometry/base.py", line 73, in geom_factory
    raise ValueError("No Shapely geometry can be created from null value")
ValueError: No Shapely geometry can be created from null value

But others :

$ kraken -d cuda:0 -i vol02_page0212_f001.png vol02_page0212_f001.xml -a segment -bl -i ~/.../seg_best.mlmodel ocr -m ~/.../rec_best.mlmodel
[0.0123] Baseline model (~/.../seg_best.mlmodel) given but legacy segmenter selected. Forcing to -bl. 
WARNING:root:Torch version 1.10.1+cu102 has not been tested with coremltools. You may run into unexpected errors. Torch 1.9.1 is the most recent version that has been tested.
Loading ANN ~/.../seg_best.mlmodel     ✓
Loading ANN default     ✓
Segmenting      ✓
Processing  [####################################]  100%          
Writing recognition results for vol02_page0212_f001.png ✓

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 27 (26 by maintainers)

Commits related to this issue

Most upvoted comments

I can reproduce your bug with 1.8.2 and another unrelated TopologyException for 1.8.4. Awesome.

Urrrgh another shapely/GEOM bug. I’ll look into it. In fact the code just above your instrumentation is there to circumvent TopologyExceptions caused by corner cases in the unary_union function. Apparently, we haven’t caught them all yet. Why can’t geometry be simple?