documentation: Drupal / Gemini File UUID Mismatch

While testing riprap I discovered several of my Media aren’t getting picked up by check_fixity because the uuid being reported by the JSON API (sourced from the Drupal file_managed.uuid field) doesn’t match the uuid stored in Gemini for that file. However, it is inconsistent. Some do match while many don’t.

For example, (after adding some print statements to riprap’s PluginFetchResourceListFromDrupal.php) one of my successful items reports:

Node id: 233784
	media_url: http://dams.library.unlv.edu/node/233784/media
	field_media_file uuid: 9471ac1a-116e-4220-a93b-d03d5aa292a0
	fedora_url: http://localhost:8080/fcrepo/rest/masters/ent/ent001450-056.tif

Indeed, the Gemini db does have the correct uuid for this item:

+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------------------------------+------------------------------------------------------------------------------------+---------------------+---------------------+
| fedora_hash                                                                                                                      | drupal_hash                                                                                                                      | uuid                                 | drupal_uri                                                                   | fedora_uri                                                                         | dateCreated         | dateUpdated         |
+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------------------------------+------------------------------------------------------------------------------------+---------------------+---------------------+
| 2228c3bf188614f4449821f63328d4c6a7a2dbf698e18270e6388151f739e1cd61546c949950d7ce7b188c1558422700475c3f69752694f92beeeddf5963ae54 | aa30949313684de1b375cc4c687a344cf4d8c38b519e2d5a4c750ae897d2edd03828639a6d10d52248c90d4fada5bf70fb7617430a4a4ea2ee289c40c7d18775 | 9471ac1a-116e-4220-a93b-d03d5aa292a0 | http://dams.library.unlv.edu/_flysystem/fedora/masters/ent/ent001450-056.tif | http://localhost:8080/fcrepo/rest/masters/ent/ent001450-056.tif                    | 2019-03-01 14:07:19 | 2019-03-01 14:07:19 |
+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------------------------------+------------------------------------------------------------------------------------+---------------------+---------------------+

Compared to Drupal for the corresponding file:

+-------+--------------------------------------+----------+------+-------------------+----------------------------------------+------------+----------+--------+------------+------------+
| fid   | uuid                                 | langcode | uid  | filename          | uri                                    | filemime   | filesize | status | created    | changed    |
+-------+--------------------------------------+----------+------+-------------------+----------------------------------------+------------+----------+--------+------------+------------+
| 92374 | 9471ac1a-116e-4220-a93b-d03d5aa292a0 | en       | NULL | ent001450-056.tif | fedora://masters/ent/ent001450-056.tif | image/tiff | 95503106 |      1 | 1551457224 | 1551981341 |
+-------+--------------------------------------+----------+------+-------------------+----------------------------------------+------------+----------+--------+------------+------------+

However, one of my unsuccessful ones displays:

Node id: 205112
	media_url: http://dams.library.unlv.edu/node/205112/media
	field_media_file uuid: de77f71f-6321-4cb7-900a-080ffb219cbf
	fedora_url: 

‘fedora_url’ is blank because $fedora_url = $this->getFedoraUrl($media['field_media_file'][0]['target_uuid']);, which queries Gemini using the uuid, returned false. Indeed, the Gemini database has a different UUID for this item: (‘de77f71f-6321-4cb7-900a-080ffb219cbf’ in Drupal, ‘498944af-2773-4d4b-9526-9f7a08d326da’ in Gemini)

+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------------------------------+------------------------------------------------------------------------------------+---------------------+---------------------+
| fedora_hash                                                                                                                      | drupal_hash                                                                                                                      | uuid                                 | drupal_uri                                                                   | fedora_uri                                                                         | dateCreated         | dateUpdated         |
+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------------------------------+------------------------------------------------------------------------------------+---------------------+---------------------+
| 6c2ed41faef22f2ee0cfeda0ce9a5303c5771f3b9900ad3d2713b08b42d6d8a0167b4da1b89fb22f5abe4c668e22575ac7f4d8cbfb7af9eba52569320ff1fb38 | 347fb4ee386c59798276ca8e4576dbf0ace0d48c9a9d43a25705cada1d9761f77f70a36a10aa2009d061373ef599404660bf3f9de4cfe424a1b6df99c5a93c9f | 498944af-2773-4d4b-9526-9f7a08d326da | http://dams.library.unlv.edu/_flysystem/fedora/masters/ent/ent000725-013.tif | http://localhost:8080/fcrepo/rest/masters/ent/ent000725-013.tif                    | 2019-02-19 20:20:10 | 2019-02-19 20:20:10 |
+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------------------------------+------------------------------------------------------------------------------------+---------------------+---------------------+

Compared to Drupal:

+-------+--------------------------------------+----------+------+-------------------+----------------------------------------+------------+-----------+--------+------------+------------+
| fid   | uuid                                 | langcode | uid  | filename          | uri                                    | filemime   | filesize  | status | created    | changed    |
+-------+--------------------------------------+----------+------+-------------------+----------------------------------------+------------+-----------+--------+------------+------------+
| 57984 | de77f71f-6321-4cb7-900a-080ffb219cbf | en       | NULL | ent000725-013.tif | fedora://masters/ent/ent000725-013.tif | image/tiff | 145856244 |      1 | 1550790127 | 1551798977 |
+-------+--------------------------------------+----------+------+-------------------+----------------------------------------+------------+-----------+--------+------------+------------+

It is not clear to me why some uuids would be correctly recorded in Gemini, while others aren’t. Any theories about the cause or strategies for fixing the discrepancies are welcomed.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 34 (34 by maintainers)

Most upvoted comments

@seth-shaw-unlv Confirmed.

vagrant@claw:/var/www/html/drupal/web/modules/contrib/islandora$ curl -i -H "Authorization: Bearer islandora" -X PUT -d '{"drupal":"abc","fedora":"123"}' -H "Content-Type: application/json" localhost:8000/gemini/abc123
HTTP/1.1 204 No Content
Date: Thu, 09 May 2019 15:08:56 GMT
Server: Apache/2.4.18 (Ubuntu)
X-Powered-By: PHP/7.1.28-1+ubuntu16.04.1+deb.sury.org+3
Cache-Control: no-cache, private
Content-Type: text/html; charset=UTF-8

vagrant@claw:/var/www/html/drupal/web/modules/contrib/islandora$ curl -i -H "Authorization: Bearer islandora" -X PUT -d '{"drupal":"abc","fedora":"123"}' -H "Content-Type: application/json" localhost:8000/gemini/def456
HTTP/1.1 204 No Content
Date: Thu, 09 May 2019 15:09:07 GMT
Server: Apache/2.4.18 (Ubuntu)
X-Powered-By: PHP/7.1.28-1+ubuntu16.04.1+deb.sury.org+3
Cache-Control: no-cache, private
Content-Type: text/html; charset=UTF-8

vagrant@claw:/var/www/html/drupal/web/modules/contrib/islandora$ curl -i -H "Authorization: Bearer islandora" localhost:8000/gemini/def456HTTP/1.1 404 Not Found
Date: Thu, 09 May 2019 15:10:33 GMT
Server: Apache/2.4.18 (Ubuntu)
X-Powered-By: PHP/7.1.28-1+ubuntu16.04.1+deb.sury.org+3
Cache-Control: no-cache, private
Content-Length: 36
Content-Type: text/html; charset=UTF-8

Could not locate URL pair for def456

I’m on board. 409 FTW.

How in the world did I time travel? Github best unlock its secrets to me. I have children, dangit, and need more time for everything!

@whikloj It makes the update query, which is successfully applied, but it just overwrites it with the exact same data. So it’s successful in technicality.

304 is for GET and HEADs and caching, etc… but hijacking it for a PUT response when nothing was changed feels appropriate. It certainly conveys the message better than 200, which I imagine would imply that there was some operation performed. And it’s not like the HTTP police are going to come and take us away…

Also, found this when researching:

Ah hah! Gemini has a unique key on the fedora/drupal uri hashes:

UNIQUE KEY `fedora_hash` (`fedora_hash`,`drupal_hash`)

Since both the Drupal and Fedora URIs shouldn’t have changed from one entry to the other the hashes wouldn’t have either. The log doesn’t show a delete for the old record so the database constraint would have refused the new one. However, the 204 response (SUCCESS, NO CONTENT) doesn’t accurately reflect the new record’s failure.

I mentioned Media because we changed how we index them in Gemini a while ago, like a year or so ago. TL;DR there’s weird timing issues, and it’s really just the file and the node that get indexed in Gemini, not the Media.

And yeah… this is surely just a bug that’s been laying in wait until someone uses Islandora 8 enough to find it.

So what’s different between the two that would make one get indexed properly and another not? They’re both files in Fedora, right? It’s not like this is because one’s on public or not… And maybe is one a derivative and the other an original file?