stratisd: missing stratis pool after update to Fedora 39: thin_repair out of metadata space
Hello, I’ve just updated to Fedora 39 and have noticed that 1 of 3 stratis pools has disappeared from stratis management. How can I recover this data?
Running stratis version:
stratisd-3.6.3-1.fc39.x86_64
stratis-cli-3.6.0-1.fc39.noarch
The missing pool is the net.ejjohnson.home pool listed below in the stratis report result under partially_constructed_pools
$ stratis report
{
"name_to_pool_uuid_map": {
"net.ejjohnson.home": "7e18ddcd-9924-4c92-b926-100a7498630b"
},
"partially_constructed_pools": [
{
"devices": [
{
"device_uuid": "5335c8c3-df0e-4f29-a241-edc58f384e21",
"devnode": "/dev/sdc",
"major": 8,
"minor": 32,
"pool_uuid": "7e18ddcd-9924-4c92-b926-100a7498630b"
}
],
"pool_uuid": "7e18ddcd-9924-4c92-b926-100a7498630b"
}
],
"path_to_ids_map": {
"/dev/sdc": [
"7e18ddcd-9924-4c92-b926-100a7498630b",
"5335c8c3-df0e-4f29-a241-edc58f384e21"
]
},
"pools": [
{
"available_actions": "fully_operational",
"blockdevs": {
"cachedevs": [],
"datadevs": [
{
"blksizes": "base: BLKSSSZGET: 512 bytes, BLKPBSZGET: 4096 bytes, crypt: None",
"in_use": true,
"path": "/dev/sdb",
"size": "7814037168 sectors",
"uuid": "5c9a2e88-cac0-4f38-a988-d72347894123"
},
{
"blksizes": "base: BLKSSSZGET: 512 bytes, BLKPBSZGET: 4096 bytes, crypt: None",
"in_use": true,
"path": "/dev/sda",
"size": "7814037168 sectors",
"uuid": "ec2d0e73-64fd-437a-ac4b-f5800248f44a"
}
]
},
"filesystems": [
{
"name": "fs_raw",
"size": "4294967296 sectors",
"size_limit": "Not set",
"used": "885516140544 bytes",
"uuid": "e8071df3-346a-4753-bda1-524c84d037f9"
}
],
"fs_limit": 100,
"name": "io.vos",
"uuid": "8d86f3f6-8666-490b-99b0-b5c6a5fc7986"
},
{
"available_actions": "fully_operational",
"blockdevs": {
"cachedevs": [],
"datadevs": [
{
"blksizes": "base: BLKSSSZGET: 512 bytes, BLKPBSZGET: 512 bytes, crypt: None",
"in_use": true,
"path": "/dev/sdd",
"size": "250069680 sectors",
"uuid": "a3928e05-964a-4e65-8b76-5ea557ee6ff0"
}
]
},
"filesystems": [
{
"name": "tmp",
"size": "2147483648 sectors",
"size_limit": "Not set",
"used": "2157969408 bytes",
"uuid": "8bec6004-cfe7-4820-99c3-1a827d37ba7f"
}
],
"fs_limit": 100,
"name": "local.volatile",
"uuid": "c70a5b86-fa15-45b3-a71e-4a9f78d1e340"
}
],
"stopped_pools": []
}
looking at the block devices the the missing pool should be on sdc:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 3.6T 0 disk
└─stratis-1-private-8d86f3f68666490b99b0b5c6a5fc7986-physical-originsub 253:12 0 7.3T 0 stratis
├─stratis-1-private-8d86f3f68666490b99b0b5c6a5fc7986-flex-thinmeta 253:13 0 7.4G 0 stratis
│ └─stratis-1-private-8d86f3f68666490b99b0b5c6a5fc7986-thinpool-pool 253:15 0 7.3T 0 stratis
│ └─stratis-1-8d86f3f68666490b99b0b5c6a5fc7986-thin-fs-e8071df3346a4753bda1524c84d037f9 253:17 0 2T 0 stratis /mnt/io.vos_raw
├─stratis-1-private-8d86f3f68666490b99b0b5c6a5fc7986-flex-thindata 253:14 0 7.3T 0 stratis
│ └─stratis-1-private-8d86f3f68666490b99b0b5c6a5fc7986-thinpool-pool 253:15 0 7.3T 0 stratis
│ └─stratis-1-8d86f3f68666490b99b0b5c6a5fc7986-thin-fs-e8071df3346a4753bda1524c84d037f9 253:17 0 2T 0 stratis /mnt/io.vos_raw
└─stratis-1-private-8d86f3f68666490b99b0b5c6a5fc7986-flex-mdv 253:16 0 16M 0 stratis
sdb 8:16 0 3.6T 0 disk
└─stratis-1-private-8d86f3f68666490b99b0b5c6a5fc7986-physical-originsub 253:12 0 7.3T 0 stratis
├─stratis-1-private-8d86f3f68666490b99b0b5c6a5fc7986-flex-thinmeta 253:13 0 7.4G 0 stratis
│ └─stratis-1-private-8d86f3f68666490b99b0b5c6a5fc7986-thinpool-pool 253:15 0 7.3T 0 stratis
│ └─stratis-1-8d86f3f68666490b99b0b5c6a5fc7986-thin-fs-e8071df3346a4753bda1524c84d037f9 253:17 0 2T 0 stratis /mnt/io.vos_raw
├─stratis-1-private-8d86f3f68666490b99b0b5c6a5fc7986-flex-thindata 253:14 0 7.3T 0 stratis
│ └─stratis-1-private-8d86f3f68666490b99b0b5c6a5fc7986-thinpool-pool 253:15 0 7.3T 0 stratis
│ └─stratis-1-8d86f3f68666490b99b0b5c6a5fc7986-thin-fs-e8071df3346a4753bda1524c84d037f9 253:17 0 2T 0 stratis /mnt/io.vos_raw
└─stratis-1-private-8d86f3f68666490b99b0b5c6a5fc7986-flex-mdv 253:16 0 16M 0 stratis
sdc 8:32 0 2.7T 0 disk
└─stratis-1-private-7e18ddcd99244c92b926100a7498630b-physical-originsub 253:3 0 2.7T 0 stratis
├─stratis-1-private-7e18ddcd99244c92b926100a7498630b-flex-thinmeta 253:4 0 2.8G 0 stratis
└─stratis-1-private-7e18ddcd99244c92b926100a7498630b-flex-thinmetaspare 253:5 0 16M 0 stratis
sdd 8:48 0 119.2G 0 disk
└─stratis-1-private-c70a5b86fa1545b3a71e4a9f78d1e340-physical-originsub 253:6 0 119.2G 0 stratis
├─stratis-1-private-c70a5b86fa1545b3a71e4a9f78d1e340-flex-thinmeta 253:7 0 112M 0 stratis
│ └─stratis-1-private-c70a5b86fa1545b3a71e4a9f78d1e340-thinpool-pool 253:9 0 119.1G 0 stratis
│ └─stratis-1-c70a5b86fa1545b3a71e4a9f78d1e340-thin-fs-8bec6004cfe7482099c31a827d37ba7f 253:11 0 1T 0 stratis /opt/volatile/tmp
├─stratis-1-private-c70a5b86fa1545b3a71e4a9f78d1e340-flex-thindata 253:8 0 119.1G 0 stratis
│ └─stratis-1-private-c70a5b86fa1545b3a71e4a9f78d1e340-thinpool-pool 253:9 0 119.1G 0 stratis
│ └─stratis-1-c70a5b86fa1545b3a71e4a9f78d1e340-thin-fs-8bec6004cfe7482099c31a827d37ba7f 253:11 0 1T 0 stratis /opt/volatile/tmp
└─stratis-1-private-c70a5b86fa1545b3a71e4a9f78d1e340-flex-mdv 253:10 0 16M 0 stratis
sde 8:64 1 57.3G 0 disk /run/media/erick/Sandisk-Ultra
zram0 252:0 0 8G 0 disk [SWAP]
nvme0n1 259:0 0 465.8G 0 disk
├─nvme0n1p1 259:1 0 1G 0 part /boot
└─nvme0n1p2 259:2 0 464.8G 0 part
└─luks-5f8f4e1c-e8f2-4329-bc24-f56bf78cf515 253:0 0 464.8G 0 crypt
├─fedora-root 253:1 0 445G 0 lvm /
└─fedora-swap 253:2 0 15.7G 0 lvm [SWAP]
About this issue
- Original URL
- State: closed
- Created 6 months ago
- Comments: 47 (24 by maintainers)
@erickj I’m pleased to hear that your pool is back up.
Regarding question (2) I believe that it will be safe for you to reinstall the current version of stratisd.
Regarding question (1), there are really three issues that affected you, in sequence. The first was that there were stray zeros in a particular region in the thin metadata on this pool only. The second was that the new version of
thin_checkdetected these stray zeros, which it had not previously done. The third was that when stratisd ranthin_repairthe target device on your pool was too small. I can not guess why those stray zeros appeared and it may be very hard to discover that. Regarding whetherthin_checkshould report an error on this condition, I am uncertain, and @mingnus is best able to make that decision. Regarding the third problem, that the thin meta spare device is too small to be usable as a target: we are working on developing a remediation that will be safe and well tested and also a way of identifying this problem for any other users.I expect we will close this issue in about a week assuming your pool continues well. I’ve opened a new issue[1] for the remediation task.
Thanks for your patience and clear communication around all of this.
[1] https://github.com/stratis-storage/project/issues/683
@mulkieran apologies for the late reply, I was unavailable yesterday.
Good news is that your suggestions above seem to have worked.
Remounting the filesystem has succeeded and the drive is accessible again. Thank you very very much for the help with this issue 🙏
Just a few remaining questions:
@mingnus re:
No, AFAIR no other tools have been used to manipulate the filesystem
Thanks for uploading the data. I filed a PR to address the missing long options upstream, thanks for mentioning that.
thank you for the follow up @mulkieran
thinmeta.pack.tar.gz file is attached (I needed to tar.gz it to upload a “github” supported file type)
additionally as an aside… I see something odd with the
thin_metadata_packcommand.--inputis documented in the man page as you’ve given in your example. But I needed to use the short flag form of-i@erickj The COPR package is masquerading as a pre-release of
3.6.03.6.3, so I believe if you update the package you will get the regular released package back. But it is quite acceptable to keep running with this test package, as, except for this change that we’re taking advantage of, it is indistinguishable, in its behavior, from the regularly released package.@erickj What is happening is that the
thin_checkcall failed, and consequently athin_repairaction was initiated. The thin repair action was making use of your backup metadata device andthin_repairseems to have reported that device as too small, almost certainly because it is too small (16 M).