buildah: Unprivileged Rootless Buildah on Kubernetes fails due to newgidmap/newuidmap: Operation not permitted error
Description I have an unprivileged rootless Buildah container running on kubernetes/CRI-O on a Centos 7.9 host using VFS storage. When running any buildah command I receive the following output:
WARN[0000] error running newgidmap: exit status 1: newgidmap: write to gid_map failed: Operation not permitted
WARN[0000] falling back to single mapping
WARN[0000] error running newuidmap: exit status 1: newuidmap: write to uid_map failed: Operation not permitted
WARN[0000] falling back to single mapping
The buildah bud command eventually fails due to the following error:
ERRO[0001] Error while applying layer: ApplyLayer exit status 1 stdout: stderr: potentially insufficient UIDs or GIDs available in user namespace (requested 192:192 for /run/systemd/netif): Check /etc/subuid and /etc/subgid: lchown /run/systemd/netif: invalid argument
error creating build container: writing blob: adding layer with blob "sha256:b297047bc4b568dedc27cbeda009a435b0714fd2f681870601890a4b1c7531e8": ApplyLayer exit status 1 stdout: stderr: potentially insufficient UIDs or GIDs available in user namespace (requested 192:192 for /run/systemd/netif): Check /etc/subuid and /etc/subgid: lchown /run/systemd/netif: invalid argument
This issue is not present when running as user build with privileged: true enabled on the pod or when running as the root user without privileged mode enabled.
Steps to reproduce the issue:
- Deploy buildah to kubernetes running as user
build - Run
buildah bud
Describe the results you received: Buildah is unable to create new gid or uid maps:
[build@runner-le6vnazt-project-140-concurrent-026xwb /]$ buildah info
WARN[0000] error running newgidmap: exit status 1: newgidmap: write to gid_map failed: Operation not permitted
WARN[0000] falling back to single mapping
WARN[0000] error running newuidmap: exit status 1: newuidmap: write to uid_map failed: Operation not permitted
WARN[0000] falling back to single mapping
Describe the results you expected: Expected buildah to be able to create mappings
Output of rpm -q buildah or apt list buildah:
buildah-1.23.3-2.fc35.x86_64
Output of buildah version:
buildah version 1.23.3 (image-spec 1.0.1-dev, runtime-spec 1.0.2-dev)
Output of cat /etc/*release:
Fedora release 35 (Thirty Five)
NAME="Fedora Linux"
VERSION="35 (Container Image)"
ID=fedora
VERSION_ID=35
VERSION_CODENAME=""
PLATFORM_ID="platform:f35"
PRETTY_NAME="Fedora Linux 35 (Container Image)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:35"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f35/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=35
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=35
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Container Image"
VARIANT_ID=container
Fedora release 35 (Thirty Five)
Fedora release 35 (Thirty Five)
Output of uname -a:
Linux runner-le6vnazt-project-140-concurrent-026xwb 3.10.0-1160.25.1.el7.x86_64 #1 SMP Wed Apr 28 21:49:45 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Output of cat /etc/containers/storage.conf:
# This file is is the configuration file for all tools
# that use the containers/storage library. The storage.conf file
# overrides all other storage.conf files. Container engines using the
# container/storage library do not inherit fields from other storage.conf
# files.
#
# Note: The storage.conf file overrides other storage.conf files based on this precedence:
# /usr/containers/storage.conf
# /etc/containers/storage.conf
# $HOME/.config/containers/storage.conf
# $XDG_CONFIG_HOME/containers/storage.conf (If XDG_CONFIG_HOME is set)
# See man 5 containers-storage.conf for more information
# The "container storage" table contains all of the server options.
[storage]
# Default Storage Driver, Must be set for proper operation.
driver = "vfs"
# Temporary storage location
runroot = "/run/containers/storage"
# Primary Read/Write location of container storage
# When changing the graphroot location on an SELINUX system, you must
# ensure the labeling matches the default locations labels with the
# following commands:
# semanage fcontext -a -e /var/lib/containers/storage /NEWSTORAGEPATH
# restorecon -R -v /NEWSTORAGEPATH
graphroot = "/var/lib/containers/storage"
# Storage path for rootless users
#
# rootless_storage_path = "$HOME/.local/share/containers/storage"
[storage.options]
# Storage options to be passed to underlying storage drivers
# AdditionalImageStores is used to pass paths to additional Read/Only image stores
# Must be comma separated list.
additionalimagestores = [
"/var/lib/shared",
]
# Remap-UIDs/GIDs is the mapping from UIDs/GIDs as they should appear inside of
# a container, to the UIDs/GIDs as they should appear outside of the container,
# and the length of the range of UIDs/GIDs. Additional mapped sets can be
# listed and will be heeded by libraries, but there are limits to the number of
# mappings which the kernel will allow when you later attempt to run a
# container.
#
#remap-uids = 0:1668442479:65536
#remap-gids = 0:1668442479:65536
# Remap-User/Group is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid or /etc/subgid file. Mappings are set up starting
# with an in-container ID of 0 and then a host-level ID taken from the lowest
# range that matches the specified name, and using the length of that range.
# Additional ranges are then assigned, using the ranges which specify the
# lowest host-level IDs first, to the lowest not-yet-mapped in-container ID,
# until all of the entries have been used for maps.
#
# remap-user = "containers"
# remap-group = "containers"
# Root-auto-userns-user is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid and /etc/subgid file. These ranges will be partitioned
# to containers configured to create automatically a user namespace. Containers
# configured to automatically create a user namespace can still overlap with containers
# having an explicit mapping set.
# This setting is ignored when running as rootless.
# root-auto-userns-user = "storage"
#
# Auto-userns-min-size is the minimum size for a user namespace created automatically.
# auto-userns-min-size=1024
#
# Auto-userns-max-size is the minimum size for a user namespace created automatically.
# auto-userns-max-size=65536
[storage.options.overlay]
# ignore_chown_errors can be set to allow a non privileged user running with
# a single UID within a user namespace to run containers. The user can pull
# and use any image even those with multiple uids. Note multiple UIDs will be
# squashed down to the default uid in the container. These images will have no
# separation between the users in the container. Only supported for the overlay
# and vfs drivers.
#ignore_chown_errors = "false"
# Inodes is used to set a maximum inodes of the container image.
# inodes = ""
# Path to an helper program to use for mounting the file system instead of mounting it
# directly.
mount_program = "/usr/bin/fuse-overlayfs"
# mountopt specifies comma separated list of extra mount options
mountopt = "nodev,fsync=0"
# Set to skip a PRIVATE bind mount on the storage home directory.
# skip_mount_home = "false"
# Size is used to set a maximum size of the container image.
# size = ""
# ForceMask specifies the permissions mask that is used for new files and
# directories.
#
# The values "shared" and "private" are accepted.
# Octal permission masks are also accepted.
#
# "": No value specified.
# All files/directories, get set with the permissions identified within the
# image.
# "private": it is equivalent to 0700.
# All files/directories get set with 0700 permissions. The owner has rwx
# access to the files. No other users on the system can access the files.
# This setting could be used with networked based homedirs.
# "shared": it is equivalent to 0755.
# The owner has rwx access to the files and everyone else can read, access
# and execute them. This setting is useful for sharing containers storage
# with other users. For instance have a storage owned by root but shared
# to rootless users as an additional store.
# NOTE: All files within the image are made readable and executable by any
# user on the system. Even /etc/shadow within your image is now readable by
# any user.
#
# OCTAL: Users can experiment with other OCTAL Permissions.
#
# Note: The force_mask Flag is an experimental feature, it could change in the
# future. When "force_mask" is set the original permission mask is stored in
# the "user.containers.override_stat" xattr and the "mount_program" option must
# be specified. Mount programs like "/usr/bin/fuse-overlayfs" present the
# extended attribute permissions to processes within containers rather then the
# "force_mask" permissions.
#
# force_mask = ""
[storage.options.thinpool]
# Storage Options for thinpool
# autoextend_percent determines the amount by which pool needs to be
# grown. This is specified in terms of % of pool size. So a value of 20 means
# that when threshold is hit, pool will be grown by 20% of existing
# pool size.
# autoextend_percent = "20"
# autoextend_threshold determines the pool extension threshold in terms
# of percentage of pool size. For example, if threshold is 60, that means when
# pool is 60% full, threshold has been hit.
# autoextend_threshold = "80"
# basesize specifies the size to use when creating the base device, which
# limits the size of images and containers.
# basesize = "10G"
# blocksize specifies a custom blocksize to use for the thin pool.
# blocksize="64k"
# directlvm_device specifies a custom block storage device to use for the
# thin pool. Required if you setup devicemapper.
# directlvm_device = ""
# directlvm_device_force wipes device even if device already has a filesystem.
# directlvm_device_force = "True"
# fs specifies the filesystem type to use for the base device.
# fs="xfs"
# log_level sets the log level of devicemapper.
# 0: LogLevelSuppress 0 (Default)
# 2: LogLevelFatal
# 3: LogLevelErr
# 4: LogLevelWarn
# 5: LogLevelNotice
# 6: LogLevelInfo
# 7: LogLevelDebug
# log_level = "7"
# min_free_space specifies the min free space percent in a thin pool require for
# new device creation to succeed. Valid values are from 0% - 99%.
# Value 0% disables
# min_free_space = "10%"
# mkfsarg specifies extra mkfs arguments to be used when creating the base
# device.
# mkfsarg = ""
# metadata_size is used to set the `pvcreate --metadatasize` options when
# creating thin devices. Default is 128k
# metadata_size = ""
# Size is used to set a maximum size of the container image.
# size = ""
# use_deferred_removal marks devicemapper block device for deferred removal.
# If the thinpool is in use when the driver attempts to remove it, the driver
# tells the kernel to remove it as soon as possible. Note this does not free
# up the disk space, use deferred deletion to fully remove the thinpool.
# use_deferred_removal = "True"
# use_deferred_deletion marks thinpool device for deferred deletion.
# If the device is busy when the driver attempts to delete it, the driver
# will attempt to delete device every 30 seconds until successful.
# If the program using the driver exits, the driver will continue attempting
# to cleanup the next time the driver is used. Deferred deletion permanently
# deletes the device and all data stored in device will be lost.
# use_deferred_deletion = "True"
# xfs_nospace_max_retries specifies the maximum number of retries XFS should
# attempt to complete IO when ENOSPC (no space) error is returned by
# underlying storage device.
# xfs_nospace_max_retries = "0"
Output of buildah info:
WARN[0000] error running newgidmap: exit status 1: newgidmap: write to gid_map failed: Operation not permitted
WARN[0000] falling back to single mapping
WARN[0000] error running newuidmap: exit status 1: newuidmap: write to uid_map failed: Operation not permitted
WARN[0000] falling back to single mapping
{
"host": {
"CgroupVersion": "v1",
"Distribution": {
"distribution": "fedora",
"version": "35"
},
"MemFree": 468131840,
"MemTotal": 16259915776,
"OCIRuntime": "crun",
"SwapFree": 8232628224,
"SwapTotal": 8254386176,
"arch": "amd64",
"cpus": 8,
"hostname": "runner-le6vnazt-project-140-concurrent-026xwb",
"kernel": "3.10.0-1160.25.1.el7.x86_64",
"os": "linux",
"rootless": true,
"uptime": "310h 24m 16.22s (Approximately 12.92 days)"
},
"store": {
"ContainerStore": {
"number": 0
},
"GraphDriverName": "vfs",
"GraphOptions": null,
"GraphRoot": "/home/build/.local/share/containers/storage",
"GraphStatus": {},
"ImageStore": {
"number": 0
},
"RunRoot": "/var/tmp/containers-user-1000/containers"
}
}
Output of capsh --print:
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Ambient set =
Current IAB: cap_chown,cap_dac_override,!cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,!cap_linux_immutable,cap_net_bind_service,!cap_net_broadcast,!cap_net_admin,!cap_net_raw,!cap_ipc_lock,!cap_ipc_owner,!cap_sys_module,!cap_sys_rawio,cap_sys_chroot,!cap_sys_ptrace,!cap_sys_pacct,!cap_sys_admin,!cap_sys_boot,!cap_sys_nice,!cap_sys_resource,!cap_sys_time,!cap_sys_tty_config,cap_mknod,!cap_lease,cap_audit_write,!cap_audit_control,cap_setfcap,!cap_mac_override,!cap_mac_admin,!cap_syslog,!cap_wake_alarm,!cap_block_suspend
Securebits: 00/0x0/1'b0 (no-new-privs=1)
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
secure-no-ambient-raise: no (unlocked)
uid=1000(build) euid=1000(build)
gid=1000(build)
groups=1000(build)
Guessed mode: UNCERTAIN (0)
Output of /etc/subgid and /etc/subuid:
build:2000:50000
Output of rpm -qa | grep shadow:
shadow-utils-4.9-9.fc35.x86_64
Output of getcap /usr/bin/newuidmap /usr/bin/newgidmap:
/usr/bin/newuidmap cap_setuid=ep
/usr/bin/newgidmap cap_setgid=ep
Output of cat /proc/sys/users/max_user_namespace on Centos 7.9 host::
15000
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (5 by maintainers)
From my understanding CAP_SETUID is equivalent to running as root, as a process with CAP_SETUID can set their UID to anything including 0 i.e. root. Are there any ways to build containers inside of a container (such as in kubernetes) without this capability?
No change when adding
--isolation=chroot