talos: TPM-based encrypted install fails on physical host

Bug Report

When installing Talos v1.5.2 on a physical machine with secure boot and TPM-based disk encryption, it may fail, likely due to missing drivers.

Description

I tried to install Talos on a bare-metal device with secure boot and full disk encryption, following the latest guide. The only deviation I have made is using version 1.5.2 instead of the guide’s 1.5.0 when pulling the containers to build the installation ISO.

After generating a Talos config (including the tpm-disk-encryption.yaml) and applying it to the machine running the metal-amd64-secureboot.iso, the node will print [talos] failed to call key handler at slot 0: stat /dev/tpm0: no such file or directory every few seconds to the console, and part of the installation will fail. Eventually (after 10+ minutes), the node reboots into the new install in an unconfigured state. Trying to apply the config again will result in the same issue, and no service ever comes up.

When I boot the host into an Arch Linux 2023.09.01 installation ISO, I see both a /dev/tpm0 and a /dev/tpmrm0 device. When I install Talos without TPM disk encryption and run talosctl -n <host-ip> ls /dev | grep tpm, there is no output.

I also tried installing Talos on a Proxmox VM (using the same ISO) using the q35 machine type and the OVMF UEFI. I was able to successfully install Talos with full-disk encryption without issue, and the node came fully online very quickly.

I suspect the missing TPM device is due to the Talos kernel is being built without drivers for many physical TPM chips (maybe only virtual?). Looking at an Arch install on another host, there are several .ko.zst files listed under /lib/modules/<kernel-version>/kernel/drivers/char/tpm, however on Talos the entire char folder is missing (kernel version 6.1.51-talos). Note: See the end of this issue for the full Arch directory listing.

The current commit for Talos’ kernel config file has the following set (from line 3127):

CONFIG_TCG_TPM=y
CONFIG_HW_RANDOM_TPM=y
CONFIG_TCG_TIS_CORE=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_TIS_I2C is not set
# CONFIG_TCG_TIS_I2C_CR50 is not set
# CONFIG_TCG_TIS_I2C_ATMEL is not set
# CONFIG_TCG_TIS_I2C_INFINEON is not set
# CONFIG_TCG_TIS_I2C_NUVOTON is not set
# CONFIG_TCG_NSC is not set
# CONFIG_TCG_ATMEL is not set
# CONFIG_TCG_INFINEON is not set
# CONFIG_TCG_XEN is not set
CONFIG_TCG_CRB=y
# CONFIG_TCG_VTPM_PROXY is not set
# CONFIG_TCG_TIS_ST33ZP24_I2C is not set
# CONFIG_TELCLOCK is not set
# CONFIG_XILLYBUS is not set
# CONFIG_XILLYUSB is not set

Both Arch Linux and Debian 12 have the following:

CONFIG_TCG_TPM=y
CONFIG_HW_RANDOM_TPM=y
CONFIG_TCG_TIS_CORE=y
CONFIG_TCG_TIS=y
CONFIG_TCG_TIS_SPI=m
CONFIG_TCG_TIS_SPI_CR50=y
CONFIG_TCG_TIS_I2C=m
CONFIG_TCG_TIS_I2C_CR50=m
CONFIG_TCG_TIS_I2C_ATMEL=m
CONFIG_TCG_TIS_I2C_INFINEON=m
CONFIG_TCG_TIS_I2C_NUVOTON=m
CONFIG_TCG_NSC=m
CONFIG_TCG_ATMEL=m
CONFIG_TCG_INFINEON=m
CONFIG_TCG_XEN=m
CONFIG_TCG_CRB=y
CONFIG_TCG_VTPM_PROXY=m
CONFIG_TCG_TIS_ST33ZP24=m
CONFIG_TCG_TIS_ST33ZP24_I2C=m
CONFIG_TCG_TIS_ST33ZP24_SPI=m
CONFIG_TELCLOCK=m
CONFIG_XILLYBUS_CLASS=m
CONFIG_XILLYBUS=m
CONFIG_XILLYBUS_PCIE=m
CONFIG_XILLYUSB=m

The names of these currently-disabled options (e.g. CONFIG_TCG_TIS_I2C_INFINEON) line up with the ko.zst files located on Arch (e.g. tpm_infineon.ko.zst). I suspect that if these were enabled, the Talos kernel would be able to find my host’s TPM chip and install without issue.

TL;DR Please enable all TPM drivers in the kernel config.

Logs

N/A (as AFAIK I can’t pull any logs from a broken install which rejects any talosctl commands?)

Environment

  • Talos version:
Client:
        Tag:         v1.5.2
        SHA:         undefined
        Built:
        Go version:  go1.21.1
        OS/Arch:     linux/amd64
Server:
        NODE:        192.168.179.11
        Tag:         v1.5.2
        SHA:         318c66b9
        Built:
        Go version:  go1.20.8
        OS/Arch:     linux/amd64
        Enabled:     RBAC
  • Kubernetes version:
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.1
  • Platform:

Physical host

Erying Polestar i9 ES (yes this is a weird motherboard)

VM (used for testing only)

VM running on Proxmox with q35 machine type and OVMF UEFI

Arch module directory listing

$ tree /lib/modules/6.5.3-arch1-1/kernel/drivers/char/tpm
/lib/modules/6.5.3-arch1-1/kernel/drivers/char/tpm
├── st33zp24
│   ├── tpm_st33zp24_i2c.ko.zst
│   ├── tpm_st33zp24.ko.zst
│   └── tpm_st33zp24_spi.ko.zst
├── tpm_atmel.ko.zst
├── tpm_i2c_atmel.ko.zst
├── tpm_i2c_infineon.ko.zst
├── tpm_i2c_nuvoton.ko.zst
├── tpm_infineon.ko.zst
├── tpm_nsc.ko.zst
├── tpm_tis_i2c_cr50.ko.zst
├── tpm_tis_i2c.ko.zst
├── tpm_tis_spi.ko.zst
├── tpm_vtpm_proxy.ko.zst
└── xen-tpmfront.ko.zst

2 directories, 14 files

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Comments: 19 (10 by maintainers)

Most upvoted comments

however I don’t see any reason why the rest of the TPM drivers shouldn’t be enabled in the Talos kernel build config

We follow a minimal kernel as possible, so unless someone needs it we’ll keep it disabled or try to ship as extension.

If you could create a build of Talos with kernel 6.1.53 (or 6.1.54, which was released about three hours ago 😄) I can confirm if Talos installs successfully with TPM encryption on this machine.

Talos 1.5.3 would ship with the latest LTS kernel at the time of release

Agreed 😄. Absolutely no rush but if you are able to build a kernel with the drivers enabled + that patch I linked, I can see if the issue is resolved tomorrow evening/night.

Really appreciate the fast response btw; getting a new build in some 40 minutes is incredible! ❤️

do note that we don;t carry any kernel patches and will wait for an upstream fix

Googling that error message reveals several threads from ~mid-August with the exact same error code. Looks like it’s a kernel bug that must have been backported to 6.1?

that’s entirely possible, things takes time 😅

Okay, here you go: ghcr.io/frezbo/imager:v1.6.0-alpha.0-38-gc5bd0ac5c@sha256:f8b49d280e808ccd7496b9bf274d5cf4f8e7067bcc8529521c97d0a47017b2b5

sure, gimme a few hours 😅