talos: TPM-based encrypted install fails on physical host
Bug Report
When installing Talos v1.5.2 on a physical machine with secure boot and TPM-based disk encryption, it may fail, likely due to missing drivers.
Description
I tried to install Talos on a bare-metal device with secure boot and full disk encryption, following the latest guide. The only deviation I have made is using version 1.5.2
instead of the guide’s 1.5.0
when pulling the containers to build the installation ISO.
After generating a Talos config (including the tpm-disk-encryption.yaml
) and applying it to the machine running the metal-amd64-secureboot.iso
, the node will print [talos] failed to call key handler at slot 0: stat /dev/tpm0: no such file or directory
every few seconds to the console, and part of the installation will fail. Eventually (after 10+ minutes), the node reboots into the new install in an unconfigured state. Trying to apply the config again will result in the same issue, and no service ever comes up.
When I boot the host into an Arch Linux 2023.09.01 installation ISO, I see both a /dev/tpm0
and a /dev/tpmrm0
device. When I install Talos without TPM disk encryption and run talosctl -n <host-ip> ls /dev | grep tpm
, there is no output.
I also tried installing Talos on a Proxmox VM (using the same ISO) using the q35
machine type and the OVMF
UEFI. I was able to successfully install Talos with full-disk encryption without issue, and the node came fully online very quickly.
I suspect the missing TPM device is due to the Talos kernel is being built without drivers for many physical TPM chips (maybe only virtual?). Looking at an Arch install on another host, there are several .ko.zst
files listed under /lib/modules/<kernel-version>/kernel/drivers/char/tpm
, however on Talos the entire char
folder is missing (kernel version 6.1.51-talos
). Note: See the end of this issue for the full Arch directory listing.
The current commit for Talos’ kernel config file has the following set (from line 3127):
CONFIG_TCG_TPM=y
CONFIG_HW_RANDOM_TPM=y
CONFIG_TCG_TIS_CORE=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_TIS_I2C is not set
# CONFIG_TCG_TIS_I2C_CR50 is not set
# CONFIG_TCG_TIS_I2C_ATMEL is not set
# CONFIG_TCG_TIS_I2C_INFINEON is not set
# CONFIG_TCG_TIS_I2C_NUVOTON is not set
# CONFIG_TCG_NSC is not set
# CONFIG_TCG_ATMEL is not set
# CONFIG_TCG_INFINEON is not set
# CONFIG_TCG_XEN is not set
CONFIG_TCG_CRB=y
# CONFIG_TCG_VTPM_PROXY is not set
# CONFIG_TCG_TIS_ST33ZP24_I2C is not set
# CONFIG_TELCLOCK is not set
# CONFIG_XILLYBUS is not set
# CONFIG_XILLYUSB is not set
Both Arch Linux and Debian 12 have the following:
CONFIG_TCG_TPM=y
CONFIG_HW_RANDOM_TPM=y
CONFIG_TCG_TIS_CORE=y
CONFIG_TCG_TIS=y
CONFIG_TCG_TIS_SPI=m
CONFIG_TCG_TIS_SPI_CR50=y
CONFIG_TCG_TIS_I2C=m
CONFIG_TCG_TIS_I2C_CR50=m
CONFIG_TCG_TIS_I2C_ATMEL=m
CONFIG_TCG_TIS_I2C_INFINEON=m
CONFIG_TCG_TIS_I2C_NUVOTON=m
CONFIG_TCG_NSC=m
CONFIG_TCG_ATMEL=m
CONFIG_TCG_INFINEON=m
CONFIG_TCG_XEN=m
CONFIG_TCG_CRB=y
CONFIG_TCG_VTPM_PROXY=m
CONFIG_TCG_TIS_ST33ZP24=m
CONFIG_TCG_TIS_ST33ZP24_I2C=m
CONFIG_TCG_TIS_ST33ZP24_SPI=m
CONFIG_TELCLOCK=m
CONFIG_XILLYBUS_CLASS=m
CONFIG_XILLYBUS=m
CONFIG_XILLYBUS_PCIE=m
CONFIG_XILLYUSB=m
The names of these currently-disabled options (e.g. CONFIG_TCG_TIS_I2C_INFINEON
) line up with the ko.zst
files located on Arch (e.g. tpm_infineon.ko.zst
). I suspect that if these were enabled, the Talos kernel would be able to find my host’s TPM chip and install without issue.
TL;DR Please enable all TPM drivers in the kernel config.
Logs
N/A (as AFAIK I can’t pull any logs from a broken install which rejects any talosctl
commands?)
Environment
- Talos version:
Client:
Tag: v1.5.2
SHA: undefined
Built:
Go version: go1.21.1
OS/Arch: linux/amd64
Server:
NODE: 192.168.179.11
Tag: v1.5.2
SHA: 318c66b9
Built:
Go version: go1.20.8
OS/Arch: linux/amd64
Enabled: RBAC
- Kubernetes version:
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.1
- Platform:
Physical host
Erying Polestar i9 ES (yes this is a weird motherboard)
VM (used for testing only)
VM running on Proxmox with q35 machine type and OVMF UEFI
Arch module directory listing
$ tree /lib/modules/6.5.3-arch1-1/kernel/drivers/char/tpm
/lib/modules/6.5.3-arch1-1/kernel/drivers/char/tpm
├── st33zp24
│ ├── tpm_st33zp24_i2c.ko.zst
│ ├── tpm_st33zp24.ko.zst
│ └── tpm_st33zp24_spi.ko.zst
├── tpm_atmel.ko.zst
├── tpm_i2c_atmel.ko.zst
├── tpm_i2c_infineon.ko.zst
├── tpm_i2c_nuvoton.ko.zst
├── tpm_infineon.ko.zst
├── tpm_nsc.ko.zst
├── tpm_tis_i2c_cr50.ko.zst
├── tpm_tis_i2c.ko.zst
├── tpm_tis_spi.ko.zst
├── tpm_vtpm_proxy.ko.zst
└── xen-tpmfront.ko.zst
2 directories, 14 files
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Comments: 19 (10 by maintainers)
We follow a minimal kernel as possible, so unless someone needs it we’ll keep it disabled or try to ship as extension.
Talos 1.5.3 would ship with the latest LTS kernel at the time of release
do note that we don;t carry any kernel patches and will wait for an upstream fix
that’s entirely possible, things takes time 😅
Okay, here you go:
ghcr.io/frezbo/imager:v1.6.0-alpha.0-38-gc5bd0ac5c@sha256:f8b49d280e808ccd7496b9bf274d5cf4f8e7067bcc8529521c97d0a47017b2b5
sure, gimme a few hours 😅