documentation: Drupal / Gemini File UUID Mismatch
While testing riprap I discovered several of my Media aren’t getting picked up by check_fixity because the uuid being reported by the JSON API (sourced from the Drupal file_managed.uuid field) doesn’t match the uuid stored in Gemini for that file. However, it is inconsistent. Some do match while many don’t.
For example, (after adding some print statements to riprap’s PluginFetchResourceListFromDrupal.php) one of my successful items reports:
Node id: 233784
media_url: http://dams.library.unlv.edu/node/233784/media
field_media_file uuid: 9471ac1a-116e-4220-a93b-d03d5aa292a0
fedora_url: http://localhost:8080/fcrepo/rest/masters/ent/ent001450-056.tif
Indeed, the Gemini db does have the correct uuid for this item:
+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------------------------------+------------------------------------------------------------------------------------+---------------------+---------------------+
| fedora_hash | drupal_hash | uuid | drupal_uri | fedora_uri | dateCreated | dateUpdated |
+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------------------------------+------------------------------------------------------------------------------------+---------------------+---------------------+
| 2228c3bf188614f4449821f63328d4c6a7a2dbf698e18270e6388151f739e1cd61546c949950d7ce7b188c1558422700475c3f69752694f92beeeddf5963ae54 | aa30949313684de1b375cc4c687a344cf4d8c38b519e2d5a4c750ae897d2edd03828639a6d10d52248c90d4fada5bf70fb7617430a4a4ea2ee289c40c7d18775 | 9471ac1a-116e-4220-a93b-d03d5aa292a0 | http://dams.library.unlv.edu/_flysystem/fedora/masters/ent/ent001450-056.tif | http://localhost:8080/fcrepo/rest/masters/ent/ent001450-056.tif | 2019-03-01 14:07:19 | 2019-03-01 14:07:19 |
+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------------------------------+------------------------------------------------------------------------------------+---------------------+---------------------+
Compared to Drupal for the corresponding file:
+-------+--------------------------------------+----------+------+-------------------+----------------------------------------+------------+----------+--------+------------+------------+
| fid | uuid | langcode | uid | filename | uri | filemime | filesize | status | created | changed |
+-------+--------------------------------------+----------+------+-------------------+----------------------------------------+------------+----------+--------+------------+------------+
| 92374 | 9471ac1a-116e-4220-a93b-d03d5aa292a0 | en | NULL | ent001450-056.tif | fedora://masters/ent/ent001450-056.tif | image/tiff | 95503106 | 1 | 1551457224 | 1551981341 |
+-------+--------------------------------------+----------+------+-------------------+----------------------------------------+------------+----------+--------+------------+------------+
However, one of my unsuccessful ones displays:
Node id: 205112
media_url: http://dams.library.unlv.edu/node/205112/media
field_media_file uuid: de77f71f-6321-4cb7-900a-080ffb219cbf
fedora_url:
‘fedora_url’ is blank because $fedora_url = $this->getFedoraUrl($media['field_media_file'][0]['target_uuid']);
, which queries Gemini using the uuid, returned false. Indeed, the Gemini database has a different UUID for this item: (‘de77f71f-6321-4cb7-900a-080ffb219cbf’ in Drupal, ‘498944af-2773-4d4b-9526-9f7a08d326da’ in Gemini)
+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------------------------------+------------------------------------------------------------------------------------+---------------------+---------------------+
| fedora_hash | drupal_hash | uuid | drupal_uri | fedora_uri | dateCreated | dateUpdated |
+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------------------------------+------------------------------------------------------------------------------------+---------------------+---------------------+
| 6c2ed41faef22f2ee0cfeda0ce9a5303c5771f3b9900ad3d2713b08b42d6d8a0167b4da1b89fb22f5abe4c668e22575ac7f4d8cbfb7af9eba52569320ff1fb38 | 347fb4ee386c59798276ca8e4576dbf0ace0d48c9a9d43a25705cada1d9761f77f70a36a10aa2009d061373ef599404660bf3f9de4cfe424a1b6df99c5a93c9f | 498944af-2773-4d4b-9526-9f7a08d326da | http://dams.library.unlv.edu/_flysystem/fedora/masters/ent/ent000725-013.tif | http://localhost:8080/fcrepo/rest/masters/ent/ent000725-013.tif | 2019-02-19 20:20:10 | 2019-02-19 20:20:10 |
+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------------------------------+------------------------------------------------------------------------------------+---------------------+---------------------+
Compared to Drupal:
+-------+--------------------------------------+----------+------+-------------------+----------------------------------------+------------+-----------+--------+------------+------------+
| fid | uuid | langcode | uid | filename | uri | filemime | filesize | status | created | changed |
+-------+--------------------------------------+----------+------+-------------------+----------------------------------------+------------+-----------+--------+------------+------------+
| 57984 | de77f71f-6321-4cb7-900a-080ffb219cbf | en | NULL | ent000725-013.tif | fedora://masters/ent/ent000725-013.tif | image/tiff | 145856244 | 1 | 1550790127 | 1551798977 |
+-------+--------------------------------------+----------+------+-------------------+----------------------------------------+------------+-----------+--------+------------+------------+
It is not clear to me why some uuids would be correctly recorded in Gemini, while others aren’t. Any theories about the cause or strategies for fixing the discrepancies are welcomed.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 34 (34 by maintainers)
@seth-shaw-unlv Confirmed.
I’m on board. 409 FTW.
How in the world did I time travel? Github best unlock its secrets to me. I have children, dangit, and need more time for everything!
@whikloj It makes the update query, which is successfully applied, but it just overwrites it with the exact same data. So it’s successful in technicality.
304
is for GET and HEADs and caching, etc… but hijacking it for a PUT response when nothing was changed feels appropriate. It certainly conveys the message better than200
, which I imagine would imply that there was some operation performed. And it’s not like the HTTP police are going to come and take us away…Also, found this when researching:
Ah hah! Gemini has a unique key on the fedora/drupal uri hashes:
Since both the Drupal and Fedora URIs shouldn’t have changed from one entry to the other the hashes wouldn’t have either. The log doesn’t show a delete for the old record so the database constraint would have refused the new one. However, the 204 response (SUCCESS, NO CONTENT) doesn’t accurately reflect the new record’s failure.
I mentioned Media because we changed how we index them in Gemini a while ago, like a year or so ago. TL;DR there’s weird timing issues, and it’s really just the file and the node that get indexed in Gemini, not the Media.
And yeah… this is surely just a bug that’s been laying in wait until someone uses Islandora 8 enough to find it.
So what’s different between the two that would make one get indexed properly and another not? They’re both files in Fedora, right? It’s not like this is because one’s on public or not… And maybe is one a derivative and the other an original file?