podman: w/ GitLab Registry: "invalid status code from registry 404"

Discussed in https://github.com/containers/podman/discussions/16842

<div type='discussions-op-text'>

Originally posted by 2xB October 6, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

When pushing to a GitLab Container Registry, I sometimes randomly get Error: trying to reuse blob sha256:... at destination: Requesting bearer token: invalid status code from registry 404 (Not Found). Weirdly enough, when I upload multiple images from the same base image, I get this error message once for every image I try to push, the second push for each image works again.

Steps to reproduce the issue:

  1. Have a GitLab project with a GitLab Runner that can build and push Podman images (e.g. via Shell executor or custom executor)

  2. Use it very often

  3. Sometimes (rarely) get this error during podman push ...

Describe the results you received:

Error: trying to reuse blob sha256:... at destination: Requesting bearer token: invalid status code from registry 404 (Not Found)

Describe the results you expected:

Successfully pushing the image to the GitLab Container Registry

Additional information you deem important (e.g. issue happens only occasionally):

I have a full log of podman --log-level=debug push ... the time it fails. I probably can’t post the full log, but if there’s something to check in that log, please tell!

Output of podman version:

Client:       Podman Engine
Version:      4.2.1
API Version:  4.2.1
Go Version:   go1.18.5
Built:        Wed Sep  7 19:58:19 2022
OS/Arch:      linux/amd64

Output of podman info:

host:
  arch: amd64
  buildahVersion: 1.27.0
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.4-2.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.4, commit: '
  cpuUtilization:
    idlePercent: 94.36
    systemPercent: 0.85
    userPercent: 4.78
  cpus: 6
  distribution:
    distribution: fedora
    variant: container
    version: "36"
  eventLogger: file
  hostname: 04e8b959c867
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.4.0-126-generic
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 2666459136
  memTotal: 4914487296
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.6-2.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.6
      commit: 18cf2efbb8feb2b2f20e316520e0fd0b6c41ef4d
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 4243320832
  swapTotal: 4294963200
  uptime: 26h 8m 44.00s (Approximately 1.08 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /var/lib/shared
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.9-1.fc36.x86_64
      Version: |-
        fusermount3 version: 3.10.5
        fuse-overlayfs: version 1.9
        FUSE library version 3.10.5
        using FUSE kernel interface version 7.31
    overlay.mountopt: nodev,fsync=0
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 40501673984
  graphRootUsed: 16835969024
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /tmp/custom-executor1517200462
  imageStore:
    number: 57
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.2.1
  Built: 1662580699
  BuiltTime: Wed Sep  7 19:58:19 2022
  GitCommit: ""
  GoVersion: go1.18.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.2.1

Package info (e.g. output of rpm -q podman or apt list podman or brew info podman):

podman/stable from quay.io, run inside another Podman inside a VirtualBox

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

podman/stable from quay.io, run inside another Podman inside a VirtualBox
```</div>

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 1
  • Comments: 19 (1 by maintainers)

Most upvoted comments

It’s worth noting though that site-wide on gitlab, any time there’s an auth / permission restriction (eg trying to view a private project you don’t have access to) the server responds with 404.

That concept of pretending something doesn’t exist is fine; but the server still needs to follow other constraints of the protocol.

E.g. that other ticket shows that the server returns a 404 on GET /jwt/auth?…. I read that as claiming not that the parameters refer to an inconsistent repo, but that /jwt/auth itself doesn’t exist, and that doesn’t really make sense.

What is the client to do with that information? We could, reasonably, and consistently with that concept, set up this operation to treat 404 (“does not exist”) the same as 403 (unauthorized) — but 403 is explicitly one of the results where we don’t retry because it wouldn’t help and it could hurt (by locking the user out due to repeated authentication failures).

If the server is encountering some kind of outage / downtime / overload, I think it should indicate an outage / downtime / overload; that reveals nothing about the authentication rules to the client, but it allows the client to make a desirable retry decision.