ganeti: gnt-instance replace-disks -s / -a fails if vm is running and secondary lvm-vol is gone

Originally reported of Google Code with ID 944.

What software version are you running? Please provide the output of "gnt-
cluster --version", "gnt-cluster version", and "hspace --version".

<b>What distribution are you using?</b>
# gnt-cluster --version
gnt-cluster (ganeti v2.11.5) 2.11.5
# gnt-cluster version
Software version: 2.11.5
Internode protocol: 2110000
Configuration format: 2110000
OS api version: 20
Export interface: 0
VCS version: (ganeti) version v2.11.5
# hspace --version
hspace (ganeti) version v2.11.5
compiled with ghc 7.4
running on linux x86_64
# cat /etc/debian_version
7.6
# apt-cache policy ganeti
ganeti:
  Installed: 2.11.5-1~bpo70+1
  Candidate: 2.11.5-1~bpo70+1
  Package pin: 2.11.5-1~bpo70+1
  Version table:
 *** 2.11.5-1~bpo70+1 990
        100 http://debian.xxxxxxxx.de/debian/ wheezy-backports/main amd64 Packages
        100 /var/lib/dpkg/status
     2.10.5-1~bpo70+1 990
        100 http://debian.xxxxxxxx.de/debian/ wheezy-backports/main amd64 Packages
     2.9.5-1~bpo70+1 990
        100 http://debian.xxxxxxxx.de/debian/ wheezy-backports/main amd64 Packages


<b>What steps will reproduce the problem?</b>
1. shutdown secondary node
2. replace disk which has failed
3. sfdisk /dev/sdc < part.sfdisk
4. vgreduce --force --removemissing lvm
5. vgextend lvm /dev/sdc6
6. gnt-node modify -O no node1
7. gnt-instance activate-disks zzzzzz
8. gnt-instance replace-disks -a zzzzzz

<b>What is the expected output? What do you see instead?</b>
activate-disks:
- Sun Sep 14 20:36:51 2014  - WARNING: Could not prepare block device disk/1 on node node1.yyyyyy.xxxxxxxxx.de (is_primary=False, pass=1): Error while assembling disk: drbd23: can't set the synchronization parameters: Can't change syncer rate: exited with exit code 10 - 23: 
- Failure: (127) Device minor not allocated\nadditional info from kernel:\nunknown minor\n
- Failure: command execution error:
- Cannot activate block devices

replace-disks:
  Sun Sep 14 20:37:09 2014  - INFO: Checking disk/0 on node2.yyyyyy.xxxxxxxxx.de
  Sun Sep 14 20:37:09 2014  - INFO: Checking disk/0 on node1.yyyyyy.xxxxxxxxx.de
  Sun Sep 14 20:37:10 2014  - INFO: Checking disk/1 on node2.yyyyyy.xxxxxxxxx.de
  Sun Sep 14 20:37:11 2014  - INFO: Checking disk/1 on node1.yyyyyy.xxxxxxxxx.de
- Failure: prerequisites not met for this operation:
- error type: wrong_state, error details:
- Please run activate-disks on instance zzzzzz.yyyyyy.xxxxxxxxx.de first


<b>Please provide any additional information below.</b>
replace-disks -s doesn't work either as the vm is still up:

root@node2 ~ # gnt-instance replace-disks -s zzzzzz
Sun Sep 14 20:35:01 2014 Replacing disk(s) 0, 1 for instance 'zzzzzz.yyyyyy.xxxxxxxxx.de'
Sun Sep 14 20:35:01 2014 Current primary node: node2.yyyyyy.xxxxxxxxx.de
Sun Sep 14 20:35:01 2014 Current seconary node: node1.yyyyyy.xxxxxxxxx.de
Sun Sep 14 20:35:16 2014  - WARNING: Could not prepare block device disk/1 on node node1.yyyyyy.xxxxxxxxx.de (is_primary=False, pass=1): Error while assembling disk: drbd23: can't set the synchronization parameters: Can't change syncer rate: exited with exit code 10 - 23: 
Failure: (127) Device minor not allocated\nadditional info from kernel:\nunknown minor\n
Sun Sep 14 20:35:17 2014 STEP 1/6 Check device existence
Sun Sep 14 20:35:17 2014  - INFO: Checking disk/0 on node2.yyyyyy.xxxxxxxxx.de
Sun Sep 14 20:35:17 2014  - INFO: Checking disk/0 on node1.yyyyyy.xxxxxxxxx.de
Sun Sep 14 20:35:25 2014  - INFO: Checking disk/1 on node2.yyyyyy.xxxxxxxxx.de
Sun Sep 14 20:35:26 2014  - INFO: Checking disk/1 on node1.yyyyyy.xxxxxxxxx.de
Sun Sep 14 20:35:33 2014  - INFO: Checking disk/0 on node2.yyyyyy.xxxxxxxxx.de
Sun Sep 14 20:35:33 2014  - INFO: Checking disk/0 on node1.yyyyyy.xxxxxxxxx.de
Sun Sep 14 20:35:38 2014  - INFO: Checking disk/1 on node2.yyyyyy.xxxxxxxxx.de
Sun Sep 14 20:35:39 2014  - INFO: Checking disk/1 on node1.yyyyyy.xxxxxxxxx.de
Failure: prerequisites not met for this operation:
error type: wrong_state, error details:
Instance 'zzzzzz.yyyyyy.xxxxxxxxx.de' is marked to be up, cannot shutdown disks


I then simply added the disks with lvcreate [...]
Activated the disks (were marked as UpToDate/UpToDate!)
So I then ran a replace-disks -s [...]

Originally added on 2014-09-14 19:05:58 +0000 UTC.

About this issue

Original URL
State: open
Created 7 years ago
Reactions: 2
Comments: 16 (1 by maintainers)

Most upvoted comments

It sounds, that you have a two node cluster? I think following workaround can work:

mark the secondary node as offline (should have no primary instances running)
rename the secondary node to an other name (hostname and DNS)
add the newly named node as a 3rd node to the cluster
then try replace-disks with --new-secondary (for all affected instances)
finally remove the offlined node

saschalucas on Dec 11, 2019