terraform-provider-aws: Creation of aws_instance with from ebs_block_device disks order

This issue was originally opened by @davivcgarcia as hashicorp/terraform#18271. It was migrated here as a result of the provider split. The original body of the issue is below.


Terraform Version

$ terraform -v
Terraform v0.11.7
+ provider.aws v1.22.0

Terraform Configuration Files


resource "aws_instance" "k8s_node" {
  ami           = "${data.aws_ami.default.id}"
  instance_type = "m5.xlarge"
  key_name      = "${aws_key_pair.default.key_name}"

  subnet_id              = "${aws_subnet.main_us-east-1a.id}"
  vpc_security_group_ids = ["${aws_security_group.default.id}"]

  root_block_device {
    volume_size = "40"
    volume_type = "standard"
  }

  ebs_block_device {
    device_name = "/dev/sdb"
    volume_size = "80"
    volume_type = "standard"
  }

  ebs_block_device {
    device_name = "/dev/sdc"
    volume_size = "250"
    volume_type = "standard"
  }

  tags {
    Name = "k8s-node"
  }
}

Expected Behavior

The resources should have the primary/boot disk (nvme0n1) of 40GB, a secondary disk (nvme1n1) of 80GB and a tertiary disk (nvme2n1) of 250GB.

Actual Behavior

Terraform creates the instance with wrong disk order, being the secondary disk (nvme1n1) of 250GB and the tertiary disk (nvme2n1) of 80GB.

Steps to Reproduce

  1. terraform init
  2. terraform apply

Output

aws_instance.k8s_node: Creating...
  ami:                                               "" => "ami-950e95ea"
  associate_public_ip_address:                       "" => "<computed>"
  availability_zone:                                 "" => "<computed>"
  ebs_block_device.#:                                "" => "2"
  ebs_block_device.2554893574.delete_on_termination: "" => "true"
  ebs_block_device.2554893574.device_name:           "" => "/dev/sdc"
  ebs_block_device.2554893574.encrypted:             "" => "<computed>"
  ebs_block_device.2554893574.snapshot_id:           "" => "<computed>"
  ebs_block_device.2554893574.volume_id:             "" => "<computed>"
  ebs_block_device.2554893574.volume_size:           "" => "250"
  ebs_block_device.2554893574.volume_type:           "" => "standard"
  ebs_block_device.2576023345.delete_on_termination: "" => "true"
  ebs_block_device.2576023345.device_name:           "" => "/dev/sdb"
  ebs_block_device.2576023345.encrypted:             "" => "<computed>"
  ebs_block_device.2576023345.snapshot_id:           "" => "<computed>"
  ebs_block_device.2576023345.volume_id:             "" => "<computed>"
  ebs_block_device.2576023345.volume_size:           "" => "80"
  ebs_block_device.2576023345.volume_type:           "" => "standard"
  ephemeral_block_device.#:                          "" => "<computed>"
  get_password_data:                                 "" => "false"
  instance_state:                                    "" => "<computed>"
  instance_type:                                     "" => "m5.xlarge"
  ipv6_address_count:                                "" => "<computed>"
  ipv6_addresses.#:                                  "" => "<computed>"
  key_name:                                          "" => "default"
  network_interface.#:                               "" => "<computed>"
  network_interface_id:                              "" => "<computed>"
  password_data:                                     "" => "<computed>"
  placement_group:                                   "" => "<computed>"
  primary_network_interface_id:                      "" => "<computed>"
  private_dns:                                       "" => "<computed>"
  private_ip:                                        "" => "<computed>"
  public_dns:                                        "" => "<computed>"
  public_ip:                                         "" => "<computed>"
  root_block_device.#:                               "" => "1"
  root_block_device.0.delete_on_termination:         "" => "true"
  root_block_device.0.volume_id:                     "" => "<computed>"
  root_block_device.0.volume_size:                   "" => "40"
  root_block_device.0.volume_type:                   "" => "standard"
  security_groups.#:                                 "" => "<computed>"
  source_dest_check:                                 "" => "true"
  subnet_id:                                         "" => "subnet-036d839562552db17"
  tags.%:                                            "" => "2"
  tags.Name:                                         "" => "k8s_node"
  tenancy:                                           "" => "<computed>"
  volume_tags.%:                                     "" => "<computed>"
  vpc_security_group_ids.#:                          "" => "1"
  vpc_security_group_ids.2684253548:                 "" => "sg-0a12ea76c68402986"
$ lsblk 
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme0n1     259:2    0   40G  0 disk 
├─nvme0n1p1 259:3    0    1M  0 part 
└─nvme0n1p2 259:4    0   40G  0 part /
nvme1n1     259:0    0  250G  0 disk 
nvme2n1     259:1    0   80G  0 disk 

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 4
  • Comments: 21 (2 by maintainers)

Commits related to this issue

Most upvoted comments

Hey all, I came up with a solid solution which I’ve had in production for the last couple months. I finally had a chance to document it on my blog today, have a look and see if this helps you.

https://russell.ballestrini.net/aws-nvme-to-block-mapping/

I can confirm I’m running into this as well and I don’t even use Terraform.

I’m experiencing out-of-order device names when upgrading from Ubuntu 14.04 -> 18.04 (images based off the official AMI).

For me I only have 2 EBS block devices, a boot and a data and even then the devices are out of order.

My provisioning system expects that /dev/nvme0n1 be root and /dev/nvme1n1 be data.

Disk /dev/nvme0n1: 120 GiB, 128849018880 bytes, 251658240 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/nvme1n1: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x34a452b2

To add some data to this, here’s the EBS devices in an ASG I have configured:

  ebs_block_device {
    device_name           = "/dev/xvdf"
    volume_type           = "gp2"
    volume_size           = 16
    delete_on_termination = true
    encrypted             = true
    iops                  = 0
    snapshot_id           = ""
    no_device             = false
  }

  ebs_block_device {
    device_name           = "/dev/xvdg"
    volume_type           = "gp2"
    volume_size           = 500
    delete_on_termination = true
    encrypted             = true
    iops                  = 0
    snapshot_id           = ""
    no_device             = false
  }

  ebs_block_device {
    device_name           = "/dev/xvdh"
    volume_type           = "gp2"
    volume_size           = 1000
    delete_on_termination = true
    encrypted             = true
    iops                  = 0
    snapshot_id           = ""
    no_device             = false
  }

Here’s the output for that section from aws autoscaling describe-launch-configuration, note that it’s an array and the order it’s in:

            "BlockDeviceMappings": [
                {
                    "DeviceName": "/dev/xvdh",
                    "Ebs": {
                        "VolumeSize": 1000,
                        "VolumeType": "gp2",
                        "DeleteOnTermination": true,
                        "Encrypted": true
                    }
                },
                {
                    "DeviceName": "/dev/xvdf",
                    "Ebs": {
                        "VolumeSize": 16,
                        "VolumeType": "gp2",
                        "DeleteOnTermination": true,
                        "Encrypted": true
                    }
                },
                {
                    "DeviceName": "/dev/xvdg",
                    "Ebs": {
                        "VolumeSize": 500,
                        "VolumeType": "gp2",
                        "DeleteOnTermination": true,
                        "Encrypted": true
                    }
                },
                {
                    "DeviceName": "/dev/sda1",
                    "Ebs": {
                        "VolumeSize": 8,
                        "VolumeType": "gp2",
                        "DeleteOnTermination": true
                    }
                }
            ],

Here’s the output of lsblk from a c5.large system launched using that LaunchConfig:

NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme0n1     259:3    0    8G  0 disk
└─nvme0n1p1 259:4    0    8G  0 part /
nvme1n1     259:0    0 1000G  0 disk
nvme2n1     259:1    0   16G  0 disk
nvme3n1     259:2    0  500G  0 disk

As you can see, the in-OS ordering reflects the ordering of the BlockDeviceMappings array, which is out-of-order WRT the desired arrangement expressed in the Terraform resource. This does not happen on older instance types (e.g., c4.large) because it still adopts the naming (if not ordering) given in the launch configuration or instance definition.

Since AWS has stopped honoring that naming convention, I would hope that terraform could perhaps start sorting that array according to device_name so we users could have at least somewhat predictable naming schemes.

If you know the size of the disk you can filter in the user script using lsblk and jq

This works

DISKNAME=`lsblk -dJo NAME,SIZE,MOUNTPOINT | jq -r '..|.?|select(.size|startswith("${storageSize}")).name'`
sudo zpool create datadrive $DISKNAME -f

Passing in the size of the esb drives size.

The upshot is that Amazon somehow considers this working as-designed. I’ve spoken with one of the Nitro engineers and, while he acknowledged that it makes life harder for users, I didn’t get the impression that they ever intend to correct this.

Their primary suggested “solution” was to use udev to order devices the way you expect. A secondary solution I started but abandoned was using snapshots of empty filesystems. The net of it is that I’ve just stopped buying as much EBS storage.

[edit] For completeness’ sake, I should point out that this “only” happens when you attach devices simultaneously, as with a Launch Config or Template. If you incrementally add devices to an instance, they attach in expected order.

Bad news. I wrote a patch to switch ebs_block_devices from a set to an array on both launch configurations and instances, and found out that one’s client-side ordering seems to not matter at all.

It’s entirely possible that made the wrong changes, but terraform and its internal tests seemed happy, and both the output of running terraform apply and terraform show seemed to show the block devices in written order. However, in checking the BlockDeviceMappings section from the AWS API (e.g. aws ec2 describe-instances) I found that they were not arranged in the order I’d created - in fact, create/destroy produced different results several times.

I went back to the upstream provider code (1.40) and observe similar behavior - terraform apply happened to hash my 3 devices in reverse order (3-2-1), but the order in the AWS API after was 1-3-2.

I’m going to attempt to submit a bug to AWS, but would suggest those of you affected do the same. Specifically, the new NVMe instances do not follow the bus order implied by device naming, but rather order by their appearance in BlockDeviceMappings. This is exacerbated when attaching multiple devices simultaneously (as with terraform), since they seem to be created asynchronously and attached to BlockDeviceMappings in order of completion.

I’ll add my voice here - it’s the same for aws_launch_configuration too. It doesn’t matter whether one uses the sdX or xvdX nomenclature, or what the ebs_block_device ordering is in the resource. Block-device ordering on the actual machine is consistent but out of order.

This seems to have been the case for a while. I back-revisioned to a 1.21.0 binary I had and it still creates the disks out of order. The difference is that the older instance types that still use SCSI emulation (e.g., t2.large) respected the device names Terraform provides. The new instance types that default to /dev/nvmeXp1 do not, however - they’re strictly named in the order presented to the OS.

Hence if I have /dev/xvdf, /dev/xvdg, and /dev/xvdh on one of the new NVMe systems but the provider creates them in the order g-f-h (which it does consistently), they will be 2-1-3 in the OS.

This may represent a bug in both the Terraform provider and AWS - that the disks are created out of order, and that the hypervisor does not respect the requested name order.

I ran into this same issue and discussions with AWS have uncovered that the ordering of disk device naming is not guaranteed to remain the same as defined at build time. This has to do with device discovery by the AMI, the order they are discovered determines the device name assigned.

This is definitely new behavior starting with the nvme* disks. I have had to implement some custom scripting that runs from user-data to map the devices as defined in terraform to the actual mount points on the host. It means you can’t use /dev/nvme1n1 or similar in fstab anymore either, you must use UUID to ensure proper mounting.