core: Matching PAGE imageFilename to mets:file when imageFilename is not a URL

Scenario:

  1. Image files and PAGE referencing those image files by relative filepath:

    <Page imageFilename="foo.tif"/>
    
  2. Create a METS file and run workspace add:

    <mets:file GROUPID="page0001" xlink:href="file://path/to/bla/foo.tif"
    

Now the PAGE imageFilename and xlink:href of the corresponding mets:file do not match anymore.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 34 (19 by maintainers)

Commits related to this issue

Most upvoted comments

Revisiting this with @tboenig:

  • imageFilename in PAGE must always be a relative file path relative to that PAGE file, otherwise tools like Aletheia or PAGEViewer won’t work
  • mets:FLocat is ideally a relative path from the mets.xml

So we need logic to determine the relative path from mets.xml to image by resolving imageFilename of a PAGE against the relative path to that PAGE.

  • mets.xml: OCR-D-PAGE/foo.xml
  • OCR-D-PAGE/foo.xml: …/OCR-D-IMG/foo.tif
  • => OCR-D-IMG/foo.tif <- mets:FLocat of that image in mets.xml