terraform-provider-aws: Creation of aws_instance with from ebs_block_device disks order
This issue was originally opened by @davivcgarcia as hashicorp/terraform#18271. It was migrated here as a result of the provider split. The original body of the issue is below.
Terraform Version
$ terraform -v
Terraform v0.11.7
+ provider.aws v1.22.0
Terraform Configuration Files
resource "aws_instance" "k8s_node" {
ami = "${data.aws_ami.default.id}"
instance_type = "m5.xlarge"
key_name = "${aws_key_pair.default.key_name}"
subnet_id = "${aws_subnet.main_us-east-1a.id}"
vpc_security_group_ids = ["${aws_security_group.default.id}"]
root_block_device {
volume_size = "40"
volume_type = "standard"
}
ebs_block_device {
device_name = "/dev/sdb"
volume_size = "80"
volume_type = "standard"
}
ebs_block_device {
device_name = "/dev/sdc"
volume_size = "250"
volume_type = "standard"
}
tags {
Name = "k8s-node"
}
}
Expected Behavior
The resources should have the primary/boot disk (nvme0n1) of 40GB, a secondary disk (nvme1n1) of 80GB and a tertiary disk (nvme2n1) of 250GB.
Actual Behavior
Terraform creates the instance with wrong disk order, being the secondary disk (nvme1n1) of 250GB and the tertiary disk (nvme2n1) of 80GB.
Steps to Reproduce
terraform initterraform apply
Output
aws_instance.k8s_node: Creating...
ami: "" => "ami-950e95ea"
associate_public_ip_address: "" => "<computed>"
availability_zone: "" => "<computed>"
ebs_block_device.#: "" => "2"
ebs_block_device.2554893574.delete_on_termination: "" => "true"
ebs_block_device.2554893574.device_name: "" => "/dev/sdc"
ebs_block_device.2554893574.encrypted: "" => "<computed>"
ebs_block_device.2554893574.snapshot_id: "" => "<computed>"
ebs_block_device.2554893574.volume_id: "" => "<computed>"
ebs_block_device.2554893574.volume_size: "" => "250"
ebs_block_device.2554893574.volume_type: "" => "standard"
ebs_block_device.2576023345.delete_on_termination: "" => "true"
ebs_block_device.2576023345.device_name: "" => "/dev/sdb"
ebs_block_device.2576023345.encrypted: "" => "<computed>"
ebs_block_device.2576023345.snapshot_id: "" => "<computed>"
ebs_block_device.2576023345.volume_id: "" => "<computed>"
ebs_block_device.2576023345.volume_size: "" => "80"
ebs_block_device.2576023345.volume_type: "" => "standard"
ephemeral_block_device.#: "" => "<computed>"
get_password_data: "" => "false"
instance_state: "" => "<computed>"
instance_type: "" => "m5.xlarge"
ipv6_address_count: "" => "<computed>"
ipv6_addresses.#: "" => "<computed>"
key_name: "" => "default"
network_interface.#: "" => "<computed>"
network_interface_id: "" => "<computed>"
password_data: "" => "<computed>"
placement_group: "" => "<computed>"
primary_network_interface_id: "" => "<computed>"
private_dns: "" => "<computed>"
private_ip: "" => "<computed>"
public_dns: "" => "<computed>"
public_ip: "" => "<computed>"
root_block_device.#: "" => "1"
root_block_device.0.delete_on_termination: "" => "true"
root_block_device.0.volume_id: "" => "<computed>"
root_block_device.0.volume_size: "" => "40"
root_block_device.0.volume_type: "" => "standard"
security_groups.#: "" => "<computed>"
source_dest_check: "" => "true"
subnet_id: "" => "subnet-036d839562552db17"
tags.%: "" => "2"
tags.Name: "" => "k8s_node"
tenancy: "" => "<computed>"
volume_tags.%: "" => "<computed>"
vpc_security_group_ids.#: "" => "1"
vpc_security_group_ids.2684253548: "" => "sg-0a12ea76c68402986"
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:2 0 40G 0 disk
├─nvme0n1p1 259:3 0 1M 0 part
└─nvme0n1p2 259:4 0 40G 0 part /
nvme1n1 259:0 0 250G 0 disk
nvme2n1 259:1 0 80G 0 disk
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 4
- Comments: 21 (2 by maintainers)
Hey all, I came up with a solid solution which I’ve had in production for the last couple months. I finally had a chance to document it on my blog today, have a look and see if this helps you.
https://russell.ballestrini.net/aws-nvme-to-block-mapping/
I can confirm I’m running into this as well and I don’t even use Terraform.
I’m experiencing out-of-order device names when upgrading from Ubuntu 14.04 -> 18.04 (images based off the official AMI).
For me I only have 2 EBS block devices, a boot and a data and even then the devices are out of order.
My provisioning system expects that
/dev/nvme0n1be root and/dev/nvme1n1be data.To add some data to this, here’s the EBS devices in an ASG I have configured:
Here’s the output for that section from
aws autoscaling describe-launch-configuration, note that it’s an array and the order it’s in:Here’s the output of
lsblkfrom ac5.largesystem launched using that LaunchConfig:As you can see, the in-OS ordering reflects the ordering of the BlockDeviceMappings array, which is out-of-order WRT the desired arrangement expressed in the Terraform resource. This does not happen on older instance types (e.g.,
c4.large) because it still adopts the naming (if not ordering) given in the launch configuration or instance definition.Since AWS has stopped honoring that naming convention, I would hope that terraform could perhaps start sorting that array according to device_name so we users could have at least somewhat predictable naming schemes.
If you know the size of the disk you can filter in the user script using lsblk and jq
This works
Passing in the size of the esb drives size.
The upshot is that Amazon somehow considers this working as-designed. I’ve spoken with one of the Nitro engineers and, while he acknowledged that it makes life harder for users, I didn’t get the impression that they ever intend to correct this.
Their primary suggested “solution” was to use udev to order devices the way you expect. A secondary solution I started but abandoned was using snapshots of empty filesystems. The net of it is that I’ve just stopped buying as much EBS storage.
[edit] For completeness’ sake, I should point out that this “only” happens when you attach devices simultaneously, as with a Launch Config or Template. If you incrementally add devices to an instance, they attach in expected order.
Bad news. I wrote a patch to switch ebs_block_devices from a set to an array on both launch configurations and instances, and found out that one’s client-side ordering seems to not matter at all.
It’s entirely possible that made the wrong changes, but terraform and its internal tests seemed happy, and both the output of running
terraform applyandterraform showseemed to show the block devices in written order. However, in checking theBlockDeviceMappingssection from the AWS API (e.g.aws ec2 describe-instances) I found that they were not arranged in the order I’d created - in fact, create/destroy produced different results several times.I went back to the upstream provider code (1.40) and observe similar behavior -
terraform applyhappened to hash my 3 devices in reverse order (3-2-1), but the order in the AWS API after was 1-3-2.I’m going to attempt to submit a bug to AWS, but would suggest those of you affected do the same. Specifically, the new NVMe instances do not follow the bus order implied by device naming, but rather order by their appearance in
BlockDeviceMappings. This is exacerbated when attaching multiple devices simultaneously (as with terraform), since they seem to be created asynchronously and attached toBlockDeviceMappingsin order of completion.I’ll add my voice here - it’s the same for
aws_launch_configurationtoo. It doesn’t matter whether one uses thesdXorxvdXnomenclature, or what theebs_block_deviceordering is in the resource. Block-device ordering on the actual machine is consistent but out of order.This seems to have been the case for a while. I back-revisioned to a 1.21.0 binary I had and it still creates the disks out of order. The difference is that the older instance types that still use SCSI emulation (e.g., t2.large) respected the device names Terraform provides. The new instance types that default to
/dev/nvmeXp1do not, however - they’re strictly named in the order presented to the OS.Hence if I have
/dev/xvdf,/dev/xvdg, and/dev/xvdhon one of the new NVMe systems but the provider creates them in the orderg-f-h(which it does consistently), they will be2-1-3in the OS.This may represent a bug in both the Terraform provider and AWS - that the disks are created out of order, and that the hypervisor does not respect the requested name order.
I ran into this same issue and discussions with AWS have uncovered that the ordering of disk device naming is not guaranteed to remain the same as defined at build time. This has to do with device discovery by the AMI, the order they are discovered determines the device name assigned.
This is definitely new behavior starting with the nvme* disks. I have had to implement some custom scripting that runs from user-data to map the devices as defined in terraform to the actual mount points on the host. It means you can’t use /dev/nvme1n1 or similar in fstab anymore either, you must use UUID to ensure proper mounting.