core: Matching PAGE imageFilename to mets:file when imageFilename is not a URL
Scenario:
-
Image files and PAGE referencing those image files by relative filepath:
<Page imageFilename="foo.tif"/> -
Create a METS file and run
workspace add:<mets:file GROUPID="page0001" xlink:href="file://path/to/bla/foo.tif"
Now the PAGE imageFilename and xlink:href of the corresponding mets:file do not match anymore.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 34 (19 by maintainers)
Commits related to this issue
- Replace OCRD-ZIP with BagIt-based spec — committed to kba/spec by kba 6 years ago
- Workspace validation: Validate that files mentioned in pc:Page/@imageFilename exist in METS and on FS, #176 — committed to kba/ocrd-core by kba 5 years ago
- Workspace validation: Validate that files mentioned in pc:Page/@imageFilename exist in METS and on FS, #176 — committed to kba/ocrd-core by kba 5 years ago
- workspace bagger: update PAGE imageFilenames, #176 — committed to kba/ocrd-core by kba 5 years ago
- Merge pull request #333 from kba/move-files-in-page workspace bagger: update PAGE imageFilenames, #176 — committed to OCR-D/core by kba 5 years ago
- page: imageFilename should emphatically NOT be URL, OCR-D/core#176 — committed to kba/spec by kba 4 years ago
Revisiting this with @tboenig:
imageFilenamein PAGE must always be a relative file path relative to that PAGE file, otherwise tools like Aletheia or PAGEViewer won’t workmets:FLocatis ideally a relative path from themets.xmlSo we need logic to determine the relative path from mets.xml to image by resolving imageFilename of a PAGE against the relative path to that PAGE.