bosh: Resizing persistent disk and vm_type fails in one single deploy execution

Describe the bug We are trying to resize vm_type and persistence disk type on a single instance but got an error message: Error: Action Failed get_task: Task 9e2690af-bb70-4516-7b1c-fe55ea420740 result: Migrating persistent disk: Remounting persistent disk as readonly: Shelling out to mount: Running command: 'mount /dev/sdc1 /var/vcap/store -o ro', stdout: '', stderr: 'mount: /var/vcap/store: /dev/sdc1 already mounted or mount point busy.

To Reproduce Steps to reproduce the behavior (example):

  1. Deploy a bosh director
  2. Upload stemcell and cloud-config
  3. Deploy manifest
  4. Change the vm_type and disk_type in the manifest
  5. Deploy modified manifest

Expected behavior The vm and attached disk is resized

Logs

Using deployment 'debug'

  instance_groups:
  - name: debug
-   persistent_disk_type: service_fabrik_hdd_1gb
+   persistent_disk_type: service_fabrik_hdd_2gb
-   vm_type: service_fabrik_vm_micro
+   vm_type: service_fabrik_vm_small

Continue? [yN]: y

Task 1011481

Task 1011481 | 10:47:02 | Preparing deployment: Preparing deployment (00:00:00)
Task 1011481 | 10:47:02 | Preparing deployment: Rendering templates (00:00:00)
Task 1011481 | 10:47:02 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 1011481 | 10:47:03 | Updating instance debug: debug/620ce8c5-c534-4778-ab05-e04a0f490df6 (0) (canary)
Task 1011481 | 10:47:03 | L executing pre-stop: debug/620ce8c5-c534-4778-ab05-e04a0f490df6 (0) (canary)
Task 1011481 | 10:47:03 | L executing drain: debug/620ce8c5-c534-4778-ab05-e04a0f490df6 (0) (canary)
Task 1011481 | 10:47:04 | L stopping jobs: debug/620ce8c5-c534-4778-ab05-e04a0f490df6 (0) (canary)
Task 1011481 | 10:47:05 | L executing post-stop: debug/620ce8c5-c534-4778-ab05-e04a0f490df6 (0) (canary) (00:05:14)
                        L Error: Action Failed get_task: Task 9e2690af-bb70-4516-7b1c-fe55ea420740 result: Migrating persistent disk: Remounting persistent disk as readonly: Shelling out to mount: Running command: 'mount /dev/sdc1 /var/vcap/store -o ro', stdout: '', stderr: 'mount: /var/vcap/store: /dev/sdc1 already mounted or mount point busy.
': exit status 32
Task 1011481 | 10:52:17 | Error: Action Failed get_task: Task 9e2690af-bb70-4516-7b1c-fe55ea420740 result: Migrating persistent disk: Remounting persistent disk as readonly: Shelling out to mount: Running command: 'mount /dev/sdc1 /var/vcap/store -o ro', stdout: '', stderr: 'mount: /var/vcap/store: /dev/sdc1 already mounted or mount point busy.
': exit status 32

Task 1011481 Started  Thu Mar  9 10:47:02 UTC 2023
Task 1011481 Finished Thu Mar  9 10:52:17 UTC 2023
Task 1011481 Duration 00:05:15
Task 1011481 error

Updating deployment:
  Expected task '1011481' to succeed but state is 'error'

Exit code 1

Versions (please complete the following information):

  • Infrastructure: [AWS, GCP, Azure]
  • BOSH version [277.2]
  • BOSH CLI version [7.0.1]
  • Stemcell version [e.g. ubuntu-Jammy/1.83]

Deployment info: manifest:

name: debug
instance_groups:
- azs: [z1]
  instances: 1
  jobs: []
  name: debug
  persistent_disk_type: <disk_type>
  networks:
  - name: sf_compilation
  stemcell: default
  vm_type: <vm_type>
  env:
    bosh:
      # c1oudc0w
      password: "$6$3RO2Vvl4EXS2TMRD$IaNjbMHYCSBiQLQr0PKK8AdfDHTsNunqh3kO7USouNS/tWAvH0JmtDfrhLlHwN0XUCUrBVpQ02hoHYgTdaaeY1"
      authorized_keys: [((ssh.public_key))]
      remove_static_libraries: true
releases: []
variables:
- name: ssh
  type: ssh
stemcells:
- alias: default
  os: ubuntu-jammy
  version: latest
update:
  canaries: 2
  canary_watch_time: 5000-60000
  max_in_flight: 1
  update_watch_time: 5000-60000

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 21 (20 by maintainers)

Commits related to this issue

Most upvoted comments

@jpalermo i do remember this yes. it took a long time to find the cause. but it seems @cunnie removed it in this commit https://github.com/cloudfoundry/bosh-linux-stemcell-builder/commit/248de3dbbdfb1db84cf07f99dcedcc3f6287855d lucky for us he is pretty descriptive in his commits messages 😃

@beyhan , thanks for clarifying the questions for @Malsourie . Regarding this finding:

They should provide answers to the questions above. We seem to have an issue with the rotation because the permanent_nats_credentials has the value of the feature configuration which is true when the feature is activate but the table colum has an inverted logic. @nouseforaname can you confirm this.

and

From my understanding if we activate the feature enable_short_lived_nats_bootstrap_credentials, should column permanent_nats_credentials be false according to the feature and column name? But indeed they always have same boolean value. In the code: certificate {agent_id}.bootstrap.agent.bosh-internal will be created when an vm is created:

The intention of the column in the VM table permanent_nats_credentials is to confirm that for that particular VM the short lived nats credentials have been rotated and to know if that VM is now using the permanent credentials or not. This flag is different from the global flag received in the manifest enable_short_lived_nats_bootstrap_credentials which indicates if new deployments are going to have the new feature (rotate the short lived bootstrap credential) enabled or not. Thanks for noticing that this could be ambiguous. As mentioned by @jpalermo , the problem when remounting a disk is not directly caused by the new feature but by the restart that it causes on the agent. We are looking at it.