bosh: Resizing persistent disk and vm_type fails in one single deploy execution
Describe the bug
We are trying to resize vm_type and persistence disk type on a single instance but got an error message: Error: Action Failed get_task: Task 9e2690af-bb70-4516-7b1c-fe55ea420740 result: Migrating persistent disk: Remounting persistent disk as readonly: Shelling out to mount: Running command: 'mount /dev/sdc1 /var/vcap/store -o ro', stdout: '', stderr: 'mount: /var/vcap/store: /dev/sdc1 already mounted or mount point busy.
To Reproduce Steps to reproduce the behavior (example):
- Deploy a bosh director
- Upload stemcell and cloud-config
- Deploy manifest
- Change the vm_type and disk_type in the manifest
- Deploy modified manifest
Expected behavior The vm and attached disk is resized
Logs
Using deployment 'debug'
instance_groups:
- name: debug
- persistent_disk_type: service_fabrik_hdd_1gb
+ persistent_disk_type: service_fabrik_hdd_2gb
- vm_type: service_fabrik_vm_micro
+ vm_type: service_fabrik_vm_small
Continue? [yN]: y
Task 1011481
Task 1011481 | 10:47:02 | Preparing deployment: Preparing deployment (00:00:00)
Task 1011481 | 10:47:02 | Preparing deployment: Rendering templates (00:00:00)
Task 1011481 | 10:47:02 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 1011481 | 10:47:03 | Updating instance debug: debug/620ce8c5-c534-4778-ab05-e04a0f490df6 (0) (canary)
Task 1011481 | 10:47:03 | L executing pre-stop: debug/620ce8c5-c534-4778-ab05-e04a0f490df6 (0) (canary)
Task 1011481 | 10:47:03 | L executing drain: debug/620ce8c5-c534-4778-ab05-e04a0f490df6 (0) (canary)
Task 1011481 | 10:47:04 | L stopping jobs: debug/620ce8c5-c534-4778-ab05-e04a0f490df6 (0) (canary)
Task 1011481 | 10:47:05 | L executing post-stop: debug/620ce8c5-c534-4778-ab05-e04a0f490df6 (0) (canary) (00:05:14)
L Error: Action Failed get_task: Task 9e2690af-bb70-4516-7b1c-fe55ea420740 result: Migrating persistent disk: Remounting persistent disk as readonly: Shelling out to mount: Running command: 'mount /dev/sdc1 /var/vcap/store -o ro', stdout: '', stderr: 'mount: /var/vcap/store: /dev/sdc1 already mounted or mount point busy.
': exit status 32
Task 1011481 | 10:52:17 | Error: Action Failed get_task: Task 9e2690af-bb70-4516-7b1c-fe55ea420740 result: Migrating persistent disk: Remounting persistent disk as readonly: Shelling out to mount: Running command: 'mount /dev/sdc1 /var/vcap/store -o ro', stdout: '', stderr: 'mount: /var/vcap/store: /dev/sdc1 already mounted or mount point busy.
': exit status 32
Task 1011481 Started Thu Mar 9 10:47:02 UTC 2023
Task 1011481 Finished Thu Mar 9 10:52:17 UTC 2023
Task 1011481 Duration 00:05:15
Task 1011481 error
Updating deployment:
Expected task '1011481' to succeed but state is 'error'
Exit code 1
Versions (please complete the following information):
- Infrastructure: [AWS, GCP, Azure]
- BOSH version [277.2]
- BOSH CLI version [7.0.1]
- Stemcell version [e.g. ubuntu-Jammy/1.83]
Deployment info: manifest:
name: debug
instance_groups:
- azs: [z1]
instances: 1
jobs: []
name: debug
persistent_disk_type: <disk_type>
networks:
- name: sf_compilation
stemcell: default
vm_type: <vm_type>
env:
bosh:
# c1oudc0w
password: "$6$3RO2Vvl4EXS2TMRD$IaNjbMHYCSBiQLQr0PKK8AdfDHTsNunqh3kO7USouNS/tWAvH0JmtDfrhLlHwN0XUCUrBVpQ02hoHYgTdaaeY1"
authorized_keys: [((ssh.public_key))]
remove_static_libraries: true
releases: []
variables:
- name: ssh
type: ssh
stemcells:
- alias: default
os: ubuntu-jammy
version: latest
update:
canaries: 2
canary_watch_time: 5000-60000
max_in_flight: 1
update_watch_time: 5000-60000
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 21 (20 by maintainers)
Commits related to this issue
- Persistent disk resizing works after agent restart Related issue: https://github.com/cloudfoundry/bosh/issues/2433 If the agent has been restarted, attempting to remount the persistent disk as read-... — committed to cloudfoundry/bosh-linux-stemcell-builder by cunnie a year ago
- Persistent disk resizing works after agent restart Related issue: https://github.com/cloudfoundry/bosh/issues/2433 If the agent has been restarted, attempting to remount the persistent disk as read-... — committed to cloudfoundry/bosh-linux-stemcell-builder by cunnie a year ago
- Add systemd config to prevent chronyd from locking bind mounts Issue: https://github.com/cloudfoundry/bosh/issues/2433 Chrony's systemd config uses ReadWritePaths to limit the files it has access to... — committed to cloudfoundry/bosh-linux-stemcell-builder by jpalermo a year ago
- Add systemd config to prevent chronyd from locking bind mounts Issue: https://github.com/cloudfoundry/bosh/issues/2433 Chrony's systemd config uses ReadWritePaths to limit the files it has access to... — committed to cloudfoundry/bosh-linux-stemcell-builder by jpalermo a year ago
- Add systemd config to prevent chronyd from locking bind mounts Issue: https://github.com/cloudfoundry/bosh/issues/2433 Chrony's systemd config uses ReadWritePaths to limit the files it has access to... — committed to cloudfoundry/bosh-linux-stemcell-builder by jpalermo a year ago
@jpalermo i do remember this yes. it took a long time to find the cause. but it seems @cunnie removed it in this commit https://github.com/cloudfoundry/bosh-linux-stemcell-builder/commit/248de3dbbdfb1db84cf07f99dcedcc3f6287855d lucky for us he is pretty descriptive in his commits messages 😃
@beyhan , thanks for clarifying the questions for @Malsourie . Regarding this finding:
and
The intention of the column in the VM table
permanent_nats_credentialsis to confirm that for that particular VM the short lived nats credentials have been rotated and to know if that VM is now using the permanent credentials or not. This flag is different from the global flag received in the manifestenable_short_lived_nats_bootstrap_credentialswhich indicates if new deployments are going to have the new feature (rotate the short lived bootstrap credential) enabled or not. Thanks for noticing that this could be ambiguous. As mentioned by @jpalermo , the problem when remounting a disk is not directly caused by the new feature but by the restart that it causes on the agent. We are looking at it.