frigate: [Config Support]: Whole Machine Crashing - looking for some tips
Describe the problem you are having
I have two docker hosts and both have a coral. I find that Frigate seems to cause the whole host to freeze completely (console is not responsive) at frequent intervals - right now I would say on average every 48 hours but its not consistent. I’ve moved the docker container to the other host and cleared out all the other dockers and the freeze follows Frigate.
Its likely Frigate is pushing the hosts much harder than any other docker and perhaps its finding a bug somewhere in the hardware or OS. The Devices are BeeLink devices running the latest Ubuntu.
Looking for some advice - has anyone seen this sort of behavior and identified the cause?
This has been happening for many months so it is not related to the beta Frigate or any particular Frigate (and likely this is NOT a Frigate bug)
Version
0.13 Beta 3
Frigate config file
database:
path: /db/frigate.db
mqtt:
host: 10.2.1.171
user: mqtt
password: xxx
ffmpeg:
# hwaccel_args: -c:v h264_qsv
# hwaccel_args: preset-intel-qsv-h264
hwaccel_args: preset-vaapi
logger:
# Optional: default log level (default: shown below)
default: warning
# Optional: module by module log level configuration
logs:
frigate.mqtt: error
detectors:
coral:
type: edgetpu
device: usb
motion:
# Optional: The threshold passed to cv2.threshold to determine if a pixel is different enough to be counted as motion. (default: shown below)
# Increasing this value will make motion detection less sensitive and decreasing it will make motion detection more sensitive.
# The value should be between 1 and 255.
threshold: 40
contour_area: 20
lightning_threshold: 0.7
detect:
max_disappeared: 500
width: 1280
# Optional: height of the frame for the input with the detect role (default: shown below)
height: 720
timestamp_style:
# Optional: Position of the timestamp (default: shown below)
# "tl" (top left), "tr" (top right), "bl" (bottom left), "br" (bottom right)
position: tl
# Optional: Format specifier conform to the Python package "datetime" (default: shown below)
# Additional Examples:
# german: "%d.%m.%Y %H:%M:%S"
format: '%m/%d/%Y %H:%M:%S'
# Optional: Color of font
color:
# All Required when color is specified (default: shown below)
red: 255
green: 255
blue: 255
# Optional: Line thickness of font (default: shown below)
thickness: 1
# Optional: Effect of lettering (default: shown below)
# None (No effect),
# "solid" (solid background in inverse color of font)
# "shadow" (shadow for font)
effect: solid
birdseye:
# Optional: Enable birdseye view (default: shown below)
enabled: true
# Optional: Width of the output resolution (default: shown below)
width: 1280
# Optional: Height of the output resolution (default: shown below)
height: 720
# Optional: Encoding quality of the mpeg1 feed (default: shown below)
# 1 is the highest quality, and 31 is the lowest. Lower quality feeds utilize less CPU resources.
quality: 8
# Optional: Mode of the view. Available options are: objects, motion, and continuous
# objects - cameras are included if they have had a tracked object within the last 30 seconds
# motion - cameras are included if motion was detected in the last 30 seconds
# continuous - all cameras are included always
mode: objects
restream: true
objects:
track:
- person
- cat
record:
enabled: true
events:
retain:
default: 10
mode: active_objects
pre_capture: 5
post_capture: 15
sync_on_startup: true
expire_interval: 60
# Optional: Configuration for the jpg snapshots written to the clips directory for each event
# NOTE: Can be overridden at the camera level
snapshots:
# Optional: Enable writing jpg snapshot to /media/frigate/clips (default: shown below)
enabled: true
# Optional: save a clean PNG copy of the snapshot image (default: shown below)
clean_copy: true
# Optional: print a timestamp on the snapshots (default: shown below)
timestamp: false
# Optional: draw bounding box on the snapshots (default: shown below)
bounding_box: false
# Optional: crop the snapshot (default: shown below)
crop: false
# Optional: height to resize the snapshot to (default: original size)
height: 175
# Optional: Restrict snapshots to objects that entered any of the listed zones (default: no required zones)
required_zones: []
# Optional: Camera override for retention settings (default: global values)
retain:
# Required: Default retention days (default: shown below)
default: 10
# Optional: Per object retention days
objects:
person: 15
# Optional: quality of the encoded jpeg, 0-100 (default: shown below)
quality: 70
ui:
# Optional: Set the default live mode for cameras in the UI (default: shown below)
live_mode: mse
# Optional: Set a timezone to use in the UI (default: use browser local time)
timezone: America/New_York
# Optional: Use an experimental recordings / camera view UI (default: shown below)
use_experimental: false
# Optional: Set the time format used.
# Options are browser, 12hour, or 24hour (default: shown below)
time_format: 12hour
# Optional: Set the date style for a specified length.
# Options are: full, long, medium, short
# Examples:
# short: 2/11/23
# medium: Feb 11, 2023
# full: Saturday, February 11, 2023
# (default: shown below).
date_style: full
# Optional: Set the time style for a specified length.
# Options are: full, long, medium, short
# Examples:
# short: 8:14 PM
# medium: 8:15:22 PM
# full: 8:15:22 PM Mountain Standard Time
# (default: shown below).
time_style: medium
# Optional: Ability to manually override the date / time styling to use strftime format
# https://www.gnu.org/software/libc/manual/html_node/Formatting-Calendar-Time.html
# possible values are shown above (default: not set)
strftime_fmt: '%Y/%m/%d %H:%M'
telemetry:
# Optional: Enabled network interfaces for bandwidth stats monitoring (default: shown below)
#network_interfaces:
# - eth
# - enp
# - eno
# - ens
# - wl
# - lo
# Optional: Configure system stats
stats:
# Enable AMD GPU stats (default: shown below)
# amd_gpu_stats: True
# Enable Intel GPU stats (default: shown below)
intel_gpu_stats: true
# Enable network bandwidth stats monitoring for camera ffmpeg processes, go2rtc, and object detectors. (default: shown below)
network_bandwidth: false
# Optional: Enable the latest version outbound check (default: shown below)
# NOTE: If you use the HomeAssistant integration, disabling this will prevent it from reporting new versions
version_check: true
cameras:
REMOVED - but I have about 15
I also wanted to include my docker compose for ideas
version: "3"
services:
frigate:
image: ghcr.io/blakeblackshear/frigate:0.13.0-beta3
# image: ghcr.io/blakeblackshear/frigate:dev-c743dfd
shm_size: "2048mb"
container_name: frigate
privileged: true
devices:
- /dev/dri:/dev/dri
volumes:
- /disk1/docker/frigate/config:/config
# - /disk1/docker/frigate/db:/db
# - /disk1/docker/frigate/media:/media/frigate
- /etc/localtime:/etc/localtime:ro
- /dev/bus/usb:/dev/bus/usb
environment:
- PUID=0
- PGID=0
- TZ=America/New_York
- FRIGATE_RTSP_PASSWORD="xxx"
- PLUS_API_KEY=xxx
restart: unless-stopped
Relevant log output
None that I can find relevant.
Frigate stats
No response
Operating system
Other
Install method
Docker Compose
Coral version
USB
Any other information that may be helpful
No response
About this issue
- Original URL
- State: open
- Created 8 months ago
- Comments: 35 (6 by maintainers)
@ggidofalvy-tc
Thats how i have done it:
@madasus Yes this is using OpenVino as detector and not coral. As far is i know you cannot use coral and yolo models together
I have a similar issue, running an i5-6500T, no external accelerator, and so far I’ve been able to ascertain the following:
Here’s my config using three random camera feeds from the Internet that I use for debugging, currently the hardware acceleration for decoding/encoding is commented out:
I tried to grab kernel crashdump via kdump, and also tried out kernel netconsole (dmesg) logging to another server running on the same network, but neither resulted in any output, which makes me think it’s a driver issue that affects the CPU itself, not even a kernel crash.
Running the beta2 image in docker-compose, the beta3 image has an issue with go2rtc failing to parse the camera feed URLs.
If you have any ideas for any further troubleshooting I could do, please do let me know.
@ggidofalvy-tc @madasus especially if your frigate machine is headless, I would recommend removing the often-default
quiet
kernel parameter/command-line-argument and addingdebug
. that’s what helped in my case linked above to at least narrow down the issue to the GPU, but I have made limited progress above as NickM has linked. my errors only showed up on the physical console, due to the hang.It’s certainly suspicious that what I reported in #8338 is also using a i7-6600(U) / Skylake GPU - same generation as you both - wondering if there is a driver bug / hardware quirk that other generations don’t have that the i915 driver isn’t handling
The next steps would be to back down frigate to a bare minimum config and slowly add parts back until you can see what is causing the issue.
Followed the advice of @Pingbo for using the yolov8 model as well. Been running for a couple weeks without any issues. My detections are more reliable as well, so that is an added bonus. Thanks @Pingbo
@Pingbo Thank you for the help and the detailed instructions! I’ve been using yolov8n for nearly two weeks now without any crashing on beta2.
I think the issue might indeed be caused by the combination of the bundled ssdlite_mobilenet_v2 model and Skylake-gen OpenVINO – is this perhaps worth documenting somewhere?