kops: Changes to new_cluster.go break OpenStack creation (and others?)
/kind bug
1. What kops version are you running? The command kops version, will display
this information.
v1.25.2
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
v1.25.x
3. What cloud provider are you using? OpenStack
4. What commands did you run? What is the simplest way to reproduce this issue?
kops create cluster according to https://kops.sigs.k8s.io/getting_started/openstack/
5. What happened after the commands executed? Panic, stack trace:
I1114 10:58:21.505521 36 create_cluster.go:831] Using SSH public key: /path/ssh_id_rsa.pub
W1114 10:58:22.587804 36 new_cluster.go:887] Running with masters in the same AZs; redundancy will be reduced
I1114 10:58:22.587876 36 new_cluster.go:1279] Cloud Provider ID = openstack
panic: interface conversion: interface is nil, not openstack.OpenstackCloud
goroutine 1 [running]:
k8s.io/kops/upup/pkg/fi/cloudup.defaultMachineType({0x0?, 0x0}, 0xc000b36f00, 0xc000600800)
k8s.io/kops/upup/pkg/fi/cloudup/populate_instancegroup_spec.go:330 +0x4b6
k8s.io/kops/upup/pkg/fi/cloudup.NewCluster(0xc0005f9200, {0x5407080, 0xc00096a180})
k8s.io/kops/upup/pkg/fi/cloudup/new_cluster.go:423 +0x1ca5
main.RunCreateCluster({0x53f5c40, 0xc00019c008}, 0x14?, {0x53cd760, 0xc000200008}, 0xc0005f9200)
k8s.io/kops/cmd/kops/create_cluster.go:517 +0x33f
main.NewCmdCreateCluster.func1(0xc000b34500?, {0xc0005f5440?, 0x24?, 0x24?})
k8s.io/kops/cmd/kops/create_cluster.go:203 +0x177
github.com/spf13/cobra.(*Command).execute(0xc000b34500, {0xc0005f5200, 0x24, 0x24})
github.com/spf13/cobra@v1.5.0/command.go:872 +0x694
github.com/spf13/cobra.(*Command).ExecuteC(0x76904a0)
github.com/spf13/cobra@v1.5.0/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.5.0/command.go:918
main.Execute()
k8s.io/kops/cmd/kops/root.go:95 +0x5c
main.main()
k8s.io/kops/cmd/kops/main.go:20 +0x17
6. What did you expect to happen? Creation of a cluster without problems.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know? No problems with kOps 1.24.x, this started with 1.25.x. I’ve tried to find out where the issue occurs in the code and traced it to https://github.com/kubernetes/kops/blob/v1.25.2/upup/pkg/fi/cloudup/new_cluster.go#L423 which has a reference to cloud which is nil because it’s declared here https://github.com/kubernetes/kops/blob/v1.25.2/upup/pkg/fi/cloudup/new_cluster.go#L276 but not assigned any value. The only cloud that gets assigned a value appears to be AWS here https://github.com/kubernetes/kops/blob/v1.25.2/upup/pkg/fi/cloudup/new_cluster.go#L286.
The reason this cloud = nil problem hurts is because you cannot assign a machine type from the kops create cluster commandline to bastion hosts, you can only assign machine types to masters and nodes. So it’s trying to think of a machine type itself and calls a defaultMachineType which requires a cloud instance.
It appears this part of the code went through quite a large refactor between 1.24.x and 1.25.x and now it’s broken at least on OpenStack. I’ve tried two different OpenStack cloud providers.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 33 (22 by maintainers)
Let me have a look at how easy of a fix this is. I unfortunately don’t have access to an openstack environment anymore, but hopefully this can be reproduced with integration tests.