milvus: [Bug]: Milvus2.3.1 standalone exited with code 132

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.3.1
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    no
- SDK version(e.g. pymilvus v2.0.0rc2): no 
- OS(Ubuntu or CentOS): ubuntu20.04
- CPU/Memory: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
- GPU: no
- Others: no

Current Behavior

运行 milvusdb/milvus:v2.2.14 没有问题

version: "3.5"

services:
  etcd:
    container_name: svddb-milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    restart: always
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:2379/health"]
      interval: 30s
      timeout: 20s
      retries: 3
    logging:
      driver: "json-file"
      options:
        max-file: "1"
        max-size: "50m"

  minio:
    container_name: svddb-milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    restart: always
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
    command: minio server /minio_data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3
    logging:
      driver: "json-file"
      options:
        max-file: "1"
        max-size: "50m"

  standalone:
    container_name: svddb-milvus-standalone
    image: milvusdb/milvus:v2.2.14
    command: ["milvus", "run", "standalone"]
    restart: always
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"
    logging:
      driver: "json-file"
      options:
        max-file: "1"
        max-size: "50m"

  zilliz_attu:
    container_name: zilliz_attu
    image: zilliz/attu:v2.2.8
    restart: always
    environment:
      HOST_URL: http://0.0.0.0:8000
      MILVUS_URL: standalone:19530
    ports:
      - "8000:3000"

networks:
  default:
    name: milvus

上面的 2.2.14 运行一点问题没有


但是运行 milvusdb/milvus:v2.3.1 就不行,连错误日志都没有,直接 exitcode 132

version: "3.5"

services:
  etcd:
    container_name: svddb-milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    restart: always
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:2379/health"]
      interval: 30s
      timeout: 20s
      retries: 3
    logging:
      driver: "json-file"
      options:
        max-file: "1"
        max-size: "50m"

  minio:
    container_name: svddb-milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    restart: always
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3
    logging:
      driver: "json-file"
      options:
        max-file: "1"
        max-size: "50m"

  standalone:
    container_name: svddb-milvus-standalone
    image: milvusdb/milvus:v2.3.1
    command: ["milvus", "run", "standalone"]
    restart: always
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"
    logging:
      driver: "json-file"
      options:
        max-file: "1"
        max-size: "50m"

  zilliz_attu:
    container_name: zilliz_attu
    image: zilliz/attu:v2.3.1
    restart: always
    environment:
      HOST_URL: http://0.0.0.0:8000
      MILVUS_URL: standalone:19530
    ports:
      - "8000:3000"

networks:
  default:
    name: milvus

使用 sudo rm -rf volumes && docker-compose down && sudo rm -rf volumes && docker-compose up -d && docker ps -a 启动

结果就是重启不断的 132 退出

╰─➤  docker ps -a | grep milvus
91a8c41de109   milvusdb/milvus:v2.3.1                     "/tini -- milvus run…"   8 seconds ago    Restarting (132) Less than a second ago                                                                                                                                                                                                  svddb-milvus-standalone
e0ffb572fb41   minio/minio:RELEASE.2023-03-20T20-16-18Z   "/usr/bin/docker-ent…"   9 seconds ago    Up 8 seconds (health: starting)           9000/tcp                                                                                                                                                                                       svddb-milvus-minio
0653dcab47d8   quay.io/coreos/etcd:v3.5.5                 "etcd -advertise-cli…"   9 seconds ago    Up 8 seconds (health: starting)           2379-2380/tcp                                                                                                                                                                                  svddb-milvus-etcd

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

no

Anything else?

我的 cpu 具体型号

─➤  lscpu
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      43 bits physical, 48 bits virtual
CPU(s):                             16
On-line CPU(s) list:                0-15
Thread(s) per core:                 1
Core(s) per socket:                 1
Socket(s):                          16
NUMA node(s):                       1
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              45
Model name:                         Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
Stepping:                           7
CPU MHz:                            1900.000
BogoMIPS:                           3800.00
Hypervisor vendor:                  VMware
Virtualization type:                full
L1d cache:                          512 KiB
L1i cache:                          512 KiB
L2 cache:                           4 MiB
L3 cache:                           240 MiB
NUMA node0 CPU(s):                  0-15
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        KVM: Vulnerable
Vulnerability L1tf:                 Mitigation; PTE Inversion
Vulnerability Mds:                  Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown:             Mitigation; PTI
Vulnerability Mmio stale data:      Unknown: No mitigations
Vulnerability Retbleed:             Mitigation; IBRS
Vulnerability Spec store bypass:    Vulnerable
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; IBRS, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon nopl xtopo
                                    logy tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm pti ibrs ibp
                                    b stibp tsc_adjust arat arch_capabilities
╰─➤  cat /etc/os-release          
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
╰─➤  uname -a                                                                                                                                                                               130 ↵
Linux admini 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

About this issue

  • Original URL
  • State: open
  • Created 9 months ago
  • Comments: 25 (13 by maintainers)

Commits related to this issue

Most upvoted comments

@crackcomm Currently we have some clues, but still work on a proper solution, could you help to rebuild milvus by following ways,

On the commit mentioned above it still failed:

Thread 1 "milvus" received signal SIGILL, Illegal instruction.
0x00007ffff13d2b0b in _GLOBAL__sub_I.00102_Hazptr.cpp () from /home/pah/ocxmr-repos/milvus/milvus/internal/core/output/lib/libknowhere.so
(gdb) bt
#0  0x00007ffff13d2b0b in _GLOBAL__sub_I.00102_Hazptr.cpp () from /home/pah/ocxmr-repos/milvus/milvus/internal/core/output/lib/libknowhere.so
#1  0x00007ffff7fe0b9a in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffd7b8, env=env@entry=0x7fffffffd7c8)
    at dl-init.c:72

There is some avx instructions used in folly lib, we are working on the fix.

是的