terraform-provider-proxmox: Unable to clone vm, looks like a timeout
I’m using Proxmox 6.3-2 and provider 2.9.0
proxmox_vm_qemu.resource-name: Still creating... [2m30s elapsed] ╷ │ Error: vm locked, could not obtain config │ │ with proxmox_vm_qemu.resource-name, │ on hosts.tf line 1, in resource "proxmox_vm_qemu" "resource-name": │ 1: resource "proxmox_vm_qemu" "resource-name" { │ ╵
But in proxmox logs I can see:
transferred: 8306819 bytes remaining: 2178941 bytes total: 10485760 bytes progression: 79.22 %
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 22
Reproduced on v2.9.10.
Tried adjusting resource create timeouts and it didn’t seem to have any effect. Here are several test runs after changing the resource create timeouts. I did not change anything at all between runs:
vm locked, could not obtain configerrorError: file provisioner errorbecause the file provisioner could not connect. “timeout - last error: dial tcp 192.168.1.78:22: connect: no route to host”Creation complete after 6m21svm locked, could not obtain configterraform destroythought there was nothing to clean up, resulting inError: 500 can't lock file '/var/lock/qemu-server/lock-135.conf' - got timeouton creationCreation complete after 6m17sCreation complete after 6m9svm locked, could not obtain configterraform destroythought there was nothing to clean up, resulting inError: file provisioner errorbecause the file provisioner could not connect. “timeout - last error: dial tcp 192.168.1.78:22: connect: no route to host”Creation complete after 6m28sSo that gives some concrete data on the different types of errors that can come up and how often they appear. The issue with the
destroyincorrectly thinking there’s nothing to destroy might be a separate issue and if I can dig up more details on that I’ll open a new ticket if one does not already exist. I suspect all the other issues are related to this timeout problem.People seem to be mentioning setting both
pm_timeoutandPM_TIMEOUTto work around this issue. In case anyone in the future is confused about which is correct environment variable to us, it is PM_TIMEOUT. It is referred to as the pm_timeout in the documentation (similar to how thepm_api_urlvalue is set by thePM_API_URLenvironment variable).Similar to what others are reporting, I found that setting
PM_TIMEOUT=600seems to make everything completely stable. I’ve redeployed several times in a row without any failures, so this seems like a solid workaround.In conclusion, I very much look forward to the fix with proper go style waits!
Update: I just got an
The plugin.(*GRPCProvider).ApplyResourceChange request was cancelled.error with PM_TIMEOUT set to 600 (and it happened at 5m20s). So apparently, either this workaround isn’t bulletproof, or there’s an additional issue that is causing trouble. Also after that error, the VM still existed in Proxmox, but terraform thought there was nothing to destroy. Workaround for that is to just delete it manually in Proxmox.Update 2: The VM is being cloned from server_1 (which is where the template is located) and deployed onto server_2 (for no particular reason). If I change the terraform file to set the target_node=“server_1”, the PM_TIMEOUT appears to be a more stable fix (maybe 100% stale?). In my case the backing store is a ceph cluster that is available on both servers. I wanted to mention this because it probably will affect the fix (waiting for the VM to be on the right node, not just for the clone operation to complete).
PVE 7.2-7; Plugin v2.9.11 cloud-init image size is 2252M.
Got timeout everytime when pm_parallel > 3:
export PM_TIMEOUTandpm_timeoutseems doesn’t make effect.I struggled on this issue for a while, and was able to fix it temporary by tweaking the
pm_timeoutparameter of the provider to a higher value. For 3proxmox_vm_qemuclones, setting it to 600 was enoughThis issue is still present in 2.9.1, 2.9.2 and 2.9.3. If a proxmox clone task takes longer than 5 minutes and 20 seconds terraform will try and send the config file which will give an error of:
Error: vm locked, could not obtain config.
A simple fix is to go back to 2.8.0 which still works.