milvus: [Bug]: Milvus2.3.1 standalone exited with code 132
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: 2.3.1
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): no
- SDK version(e.g. pymilvus v2.0.0rc2): no
- OS(Ubuntu or CentOS): ubuntu20.04
- CPU/Memory: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
- GPU: no
- Others: no
Current Behavior
运行 milvusdb/milvus:v2.2.14 没有问题
version: "3.5"
services:
etcd:
container_name: svddb-milvus-etcd
image: quay.io/coreos/etcd:v3.5.5
restart: always
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:2379/health"]
interval: 30s
timeout: 20s
retries: 3
logging:
driver: "json-file"
options:
max-file: "1"
max-size: "50m"
minio:
container_name: svddb-milvus-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
restart: always
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
command: minio server /minio_data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
logging:
driver: "json-file"
options:
max-file: "1"
max-size: "50m"
standalone:
container_name: svddb-milvus-standalone
image: milvusdb/milvus:v2.2.14
command: ["milvus", "run", "standalone"]
restart: always
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
interval: 30s
start_period: 90s
timeout: 20s
retries: 3
ports:
- "19530:19530"
- "9091:9091"
depends_on:
- "etcd"
- "minio"
logging:
driver: "json-file"
options:
max-file: "1"
max-size: "50m"
zilliz_attu:
container_name: zilliz_attu
image: zilliz/attu:v2.2.8
restart: always
environment:
HOST_URL: http://0.0.0.0:8000
MILVUS_URL: standalone:19530
ports:
- "8000:3000"
networks:
default:
name: milvus
上面的 2.2.14 运行一点问题没有
但是运行 milvusdb/milvus:v2.3.1 就不行,连错误日志都没有,直接 exitcode 132
version: "3.5"
services:
etcd:
container_name: svddb-milvus-etcd
image: quay.io/coreos/etcd:v3.5.5
restart: always
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:2379/health"]
interval: 30s
timeout: 20s
retries: 3
logging:
driver: "json-file"
options:
max-file: "1"
max-size: "50m"
minio:
container_name: svddb-milvus-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
restart: always
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
command: minio server /minio_data --console-address ":9001"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
logging:
driver: "json-file"
options:
max-file: "1"
max-size: "50m"
standalone:
container_name: svddb-milvus-standalone
image: milvusdb/milvus:v2.3.1
command: ["milvus", "run", "standalone"]
restart: always
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
interval: 30s
start_period: 90s
timeout: 20s
retries: 3
ports:
- "19530:19530"
- "9091:9091"
depends_on:
- "etcd"
- "minio"
logging:
driver: "json-file"
options:
max-file: "1"
max-size: "50m"
zilliz_attu:
container_name: zilliz_attu
image: zilliz/attu:v2.3.1
restart: always
environment:
HOST_URL: http://0.0.0.0:8000
MILVUS_URL: standalone:19530
ports:
- "8000:3000"
networks:
default:
name: milvus
使用 sudo rm -rf volumes && docker-compose down && sudo rm -rf volumes && docker-compose up -d && docker ps -a
启动
结果就是重启不断的 132 退出
╰─➤ docker ps -a | grep milvus
91a8c41de109 milvusdb/milvus:v2.3.1 "/tini -- milvus run…" 8 seconds ago Restarting (132) Less than a second ago svddb-milvus-standalone
e0ffb572fb41 minio/minio:RELEASE.2023-03-20T20-16-18Z "/usr/bin/docker-ent…" 9 seconds ago Up 8 seconds (health: starting) 9000/tcp svddb-milvus-minio
0653dcab47d8 quay.io/coreos/etcd:v3.5.5 "etcd -advertise-cli…" 9 seconds ago Up 8 seconds (health: starting) 2379-2380/tcp svddb-milvus-etcd
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
no
Anything else?
我的 cpu 具体型号
─➤ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 16
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
Stepping: 7
CPU MHz: 1900.000
BogoMIPS: 3800.00
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 512 KiB
L1i cache: 512 KiB
L2 cache: 4 MiB
L3 cache: 240 MiB
NUMA node0 CPU(s): 0-15
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: KVM: Vulnerable
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Unknown: No mitigations
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon nopl xtopo
logy tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm pti ibrs ibp
b stibp tsc_adjust arat arch_capabilities
╰─➤ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
╰─➤ uname -a 130 ↵
Linux admini 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
About this issue
- Original URL
- State: open
- Created 9 months ago
- Comments: 25 (13 by maintainers)
Commits related to this issue
- fix: update folly to resolve simd issue (#28878) related #27552 , after this, milvus could run successfully on sse4.2 only machine Signed-off-by: chasingegg <chao.gao@zilliz.com> — committed to milvus-io/milvus by chasingegg 7 months ago
- fix: [2.3] update folly to resolve simd issue (#28879) related issue: https://github.com/milvus-io/milvus/issues/27552 , after this, milvus could run successfully on sse4.2 only machine pr: #28878 S... — committed to milvus-io/milvus by chasingegg 7 months ago
- enhance: Bump Knowhere's version to 2.2.3 (#29035) Knowhere's new bug fix version: https://github.com/zilliztech/knowhere/releases/tag/v2.2.3 related issues: #28821 #28810 #27552 #27516 #28603 #2148... — committed to milvus-io/milvus by liliu-z 7 months ago
@crackcomm Currently we have some clues, but still work on a proper solution, could you help to rebuild milvus by following ways,
There is some avx instructions used in folly lib, we are working on the fix.
是的