cog: [Cog v0.8.0 error after upgrading from v0.7.2] Error on Cog build: exec: /sbin/ldconfig.real: not found

Impact: I’m unable to build any image using Cog and therefore deploy any models to Replicate.


On both Lambdalabs and TensorDock:

sudo cog build

cog.yaml:

# Configuration for Cog ⚙️
# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md

build:
  # set to true if your model requires a GPU
  gpu: true

  cuda: "11.8"

  # python version in the form '3.8' or '3.8.12'
  python_version: "3.10"

  # a list of packages in the format <package-name>==<version>
  python_packages:
    - "torch==2.0.0"
    - "transformers==4.30.1"
    - "sentencepiece==0.1.97"
    - "accelerate==0.20.3"
    # https://github.com/oobabooga/text-generation-webui/blob/main/docs/LLaMA-model.md#option-2-convert-the-weights-yourself
    - "protobuf==3.20.1"
    - "auto-gptq==0.2.2"
  
# predict.py defines how predictions are run on your model
predict: "predict.py:Predictor"

I receive the following error logs:

 => ERROR [stage-1  3/11] RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recom  16.4s
------                                                                                                                                      
 > [stage-1  3/11] RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends      make        build-essential         libssl-dev      zlib1g-dev      libbz2-dev      libreadline-dev         libsqlite3-dev  wget    curl    llvm        libncurses5-dev         libncursesw5-dev        xz-utils        tk-dev  libffi-dev      liblzma-dev     git     ca-certificates    && rm -rf /var/lib/apt/lists/*:                                                                                                              
#0 13.37 debconf: delaying package configuration, since apt-utils is not installed    

....

#0 20.67 Setting up tk-dev:amd64 (8.6.11+1build2) ...
#0 20.67 Processing triggers for libc-bin (2.35-0ubuntu3.1) ...
#0 20.67 /usr/sbin/ldconfig: 16: exec: /sbin/ldconfig.real: not found
#0 20.67 /usr/sbin/ldconfig: 16: exec: /sbin/ldconfig.real: not found
#0 20.67 dpkg: error processing package libc-bin (--configure):
#0 20.67  installed libc-bin package post-installation script subprocess returned error exit status 127
#0 20.68 Errors were encountered while processing:
#0 20.68  libc-bin
#0 20.69 E: Sub-process /usr/bin/dpkg returned an error code (1)
------
Dockerfile:13
--------------------
  12 |     ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
  13 | >>> RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
  14 | >>>      make \
  15 | >>>      build-essential \
  16 | >>>      libssl-dev \
  17 | >>>      zlib1g-dev \
  18 | >>>      libbz2-dev \
  19 | >>>      libreadline-dev \
  20 | >>>      libsqlite3-dev \
  21 | >>>      wget \
  22 | >>>      curl \
  23 | >>>      llvm \
  24 | >>>      libncurses5-dev \
  25 | >>>      libncursesw5-dev \
  26 | >>>      xz-utils \
  27 | >>>      tk-dev \
  28 | >>>      libffi-dev \
  29 | >>>      liblzma-dev \
  30 | >>>      git \
  31 | >>>      ca-certificates \
  32 | >>>      && rm -rf /var/lib/apt/lists/*
  33 |     RUN curl -s -S -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bash && \
--------------------
ERROR: failed to solve: process "/bin/sh -c apt-get update -qq && apt-get install -qqy --no-install-recommends \tmake \tbuild-essential \tlibssl-dev \tzlib1g-dev \tlibbz2-dev \tlibreadline-dev \tlibsqlite3-dev \twget \tcurl \tllvm \tlibncurses5-dev \tlibncursesw5-dev \txz-utils \ttk-dev \tlibffi-dev \tliblzma-dev \tgit \tca-certificates \t&& rm -rf /var/lib/apt/lists/*" did not complete successfully: exit code: 100
ⅹ Failed to build Docker image: exit status 1

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 22 (11 by maintainers)

Most upvoted comments

Hi @Glavin001. Thanks for your help and patience as we try to debug this issue. I apologize for the inconvenience this caused.

We just released Cog v0.8.2. This release includes #1231, which reverts #1161, which we believe to be the cause of the regression you’re seeing.

Please give that a try when you have a chance and let us know if you’re still having this issue. Thanks! 🙏

Hi everyone, apologies - I pushed this change in hopes of making the image smaller and faster to build.

It seems like you might have an older version of the cuda base image. The current version of 11.8.0-cudnn8-devel-ubuntu22.04 already have libc-bin installed, and also has /sbin/ldconfig.real. My guess is maybe the rm -rf /var/lib/apt/lists/* was important. Could you post docker images --no-trunc|grep cuda please?

Here’s the cog debug diff between v0.7.2 and v0.8.0:

--- v0.7.2.txt	2023-07-11 05:31:08
+++ v0.8.0.txt	2023-07-11 05:31:28
@@ -1,18 +1,14 @@
 $ sudo cog debug
-⚠ Cog doesn't know if CUDA 11.8 is compatible with PyTorch 2.0.0. This might cause CUDA problems.
-# syntax = docker/dockerfile:1.2
+#syntax=docker/dockerfile:1.4
+FROM curlimages/curl AS downloader
+ARG TINI_VERSION=0.19.0
+WORKDIR /tmp
+RUN curl -fsSL -O "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini" && chmod +x tini
 FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
 ENV DEBIAN_FRONTEND=noninteractive
 ENV PYTHONUNBUFFERED=1
 ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin
-RUN --mount=type=cache,target=/var/cache/apt set -eux; \
-apt-get update -qq; \
-apt-get install -qqy --no-install-recommends curl; \
-rm -rf /var/lib/apt/lists/*; \
-TINI_VERSION=v0.19.0; \
-TINI_ARCH="$(dpkg --print-architecture)"; \
-curl -sSL -o /sbin/tini "https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}"; \
-chmod +x /sbin/tini
+COPY --link --from=downloader /tmp/tini /sbin/tini
 ENTRYPOINT ["/sbin/tini", "--"]
 ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
 RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
@@ -40,9 +36,9 @@
         pyenv install-latest "3.10" && \
         pyenv global $(pyenv install-latest --print "3.10") && \
         pip install "wheel<1"
-COPY .cog/tmp/build4048584965/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl
+COPY .cog/tmp/build4127551442/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl
 RUN --mount=type=cache,target=/root/.cache/pip pip install /tmp/cog-0.0.1.dev-py3-none-any.whl
-COPY .cog/tmp/build4048584965/requirements.txt /tmp/requirements.txt
+COPY .cog/tmp/build4127551442/requirements.txt /tmp/requirements.txt
 RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt
 WORKDIR /src
 EXPOSE 5000

v0.8.0 is the issue.

Workaround: Downgrading to v0.7.2 fixes the issues! 🎉 ✅

$ sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/download/v0.7.2/cog_Linux_x86_64"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 9444k  100 9444k    0     0  10.7M      0 --:--:-- --:--:-- --:--:-- 56.5M
$ sudo chmod +x /usr/local/bin/cog
$ cog --version
cog version 0.7.2 (built 2023-05-23T10:20:56Z)
$ sudo cog --version
cog version 0.7.2 (built 2023-05-23T10:20:56Z)

$ sudo cog debug
⚠ Cog doesn't know if CUDA 11.8 is compatible with PyTorch 2.0.0. This might cause CUDA problems.
# syntax = docker/dockerfile:1.2
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin
RUN --mount=type=cache,target=/var/cache/apt set -eux; \
apt-get update -qq; \
apt-get install -qqy --no-install-recommends curl; \
rm -rf /var/lib/apt/lists/*; \
TINI_VERSION=v0.19.0; \
TINI_ARCH="$(dpkg --print-architecture)"; \
curl -sSL -o /sbin/tini "https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}"; \
chmod +x /sbin/tini
ENTRYPOINT ["/sbin/tini", "--"]
ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
        make \
        build-essential \
        libssl-dev \
        zlib1g-dev \
        libbz2-dev \
        libreadline-dev \
        libsqlite3-dev \
        wget \
        curl \
        llvm \
        libncurses5-dev \
        libncursesw5-dev \
        xz-utils \
        tk-dev \
        libffi-dev \
        liblzma-dev \
        git \
        ca-certificates \
        && rm -rf /var/lib/apt/lists/*
RUN curl -s -S -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bash && \
        git clone https://github.com/momo-lab/pyenv-install-latest.git "$(pyenv root)"/plugins/pyenv-install-latest && \
        pyenv install-latest "3.10" && \
        pyenv global $(pyenv install-latest --print "3.10") && \
        pip install "wheel<1"
COPY .cog/tmp/build4048584965/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl
RUN --mount=type=cache,target=/root/.cache/pip pip install /tmp/cog-0.0.1.dev-py3-none-any.whl
COPY .cog/tmp/build4048584965/requirements.txt /tmp/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt
WORKDIR /src
EXPOSE 5000
CMD ["python", "-m", "cog.server.http"]
COPY . /src