jibri: Ffmpeg eats all the memory and crash within a minute - recording or streaming
Description
On jitsi when I start a recording or a streaming session, in less than a minute the recording/stream will stop and my whole server become slow and unresponsive.
With top, I could pin the culprit: ffmpeg. It eats away all the memory very quickly. In less than a minute my 8GB are filled.
You can find attached the log of jibri when I tried a streaming session. Nothing stands out to me. I stopped the streaming after 15 seconds and ffmpeg was already at 40% memory.
Also if I stop completely prosody, jicofo, jvb and jibri and if I log as a jibri user and starts ffmpeg by myself, using the command I found in log.0.txt, I get the same issue, the CPU shoot to 150% and the memory keeps growing. I have to kill ffmpeg before it saturates the memory.
ffmpeg -y -v info -f x11grab -draw_mouse 0 -r 30 -s 1280x720 -thread_queue_size 4096 -i :0.0+0,0 -f alsa -thread_queue_size 4096 -i plug:bsnoop -acodec aac -strict -2 -ar 44100 -c:v libx264 -preset veryfast -maxrate 2976k -bufsize 5952k -pix_fmt yuv420p -r 30 -crf 25 -g 60 -tune zerolatency -f flv rtmp://a.rtmp.youtube.com/live2/aaa
If I remove every parameters related to sound in this ffmpeg command line, so removing -f alsa -thread_queue_size 4096 -i plug:cloop -acodec aac, then the memory saturation issue goes away. Memory usage is stable. So It clearly seems to be related to the sound. How can I debug this kind of issue ?
Possible Solution
Steps to reproduce
Environment details
Ubuntu 16, followed the instruction on github
lsmod | grep snd_aloop
snd_aloop 24576 0
snd_pcm 106496 1 snd_aloop
snd 81920 3 snd_aloop,snd_timer,snd_pcm
jibri@JibriTestSrv:/root$ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: Loopback [Loopback], device 0: Loopback PCM [Loopback PCM]
Subdevices: 8/8
Subdevice #0: subdevice #0
Subdevice #1: subdevice #1
Subdevice #2: subdevice #2
Subdevice #3: subdevice #3
Subdevice #4: subdevice #4
Subdevice #5: subdevice #5
Subdevice #6: subdevice #6
Subdevice #7: subdevice #7
card 0: Loopback [Loopback], device 1: Loopback PCM [Loopback PCM]
Subdevices: 8/8
Subdevice #0: subdevice #0
Subdevice #1: subdevice #1
Subdevice #2: subdevice #2
Subdevice #3: subdevice #3
Subdevice #4: subdevice #4
Subdevice #5: subdevice #5
Subdevice #6: subdevice #6
Subdevice #7: subdevice #7
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 103 (7 by maintainers)
Our only real solution was to increase vCPU’s number. Increasing from 4 to 8 CPU’s was the fix we used last time. It seems as if ffmpeg begins eating memory when the CPU’s are not giving enough power to it…
The main rationale for why we use chrome as a compositor for recording with Jibri is that it’s the best method we have to go from multiple WebRTC streams of audio and video to a single video with one audio and video stream. Any recorder will need to composite the videos, chose the active speaker, mix the audio, etc. Chrome happens to already do this, and the jitsi-meet client is custom-built for this job, so re-using it for recording has been the best method without needing to support a whole separate client. Would it be possible to do in a separate client? Absolutely, but then said client would need to be regularly updated when new features of Jitsi or Chrome dropped. So, it’s a reasonable question to ask why it’s designed this way, but the short answer is that with a small team, this was our best answer.
/usr/local/bin/ffmpeg
Make sure to update permissions for ffmpeg as below.
chmod 755 /usr/local/bin/ffmpegRestart jibri services, so that it picks up ffmpeg command from new location.
Wow, thank you so much for pointing out that, it’s CPU that is causing the FFMPEG to use more RAM. It completely makes sense as well. Because if CPU is insufficient, FFMPEG just keeps putting frames in queue to process and uses buffer for the same and as it grows, so does the ram usage and finally it breaks when it reaches the max memory capacity.
I’ve been struggling to solve this issue for my setup. The video quality isn’t a concern, so I tweaked the ffmpeg encoding options until I figured, that it doesn’t hit above 95% CPU Usage consistently. My setup involves making this work properly with just 2 vCPU’s as more cores increases cost in AWS significantly, on top of that my region doesn’t have access to c5a instances.
I played around with many settings and noticed as below.
I am able to do recording with this configuration on t3a.small aws instance which has just 2 vCPU and 2 GB RAM. I don’t think video recording can go any cheaper than this while maintaining the 16:9 ratio. And if you need HD or maybe Full HD video recording, 4 vCPU’s are required there is no way out of it, unless you go with ultrafast preset+lot of storage. Just putting a summary of above stuff for very stable setup of Jibri Recording while maintaining 16:9 aspect ratio.
I didn’t re-compile jar file to modify ffmpeg settings instead of that I modified the parameters on the fly by creating a ffmpeg script, which seems to work flawlessly.
I hope this helps for anyone looking for solution to this issue.
Let me know if you have queries.
Here the same. VirtualServer with U18.04, 4 Cpu’s, 8GB RAM Very interesting is, if i set “disableThirdPartyRequests: true,” (Gravatar) in
/etc/jitsi/meet/meet.mydomain.com-config.jsmy memory usage is stable.Can anybody confirm this?
That release is almost a year old, i’d suggest you test with the latest image.
On GCP/GKE, we’ve had much better luck with AMD Epyc machines (N2D) than standard ones (N1 - Intel up to Skylake). We haven’t done extensive testing, but with 2 cores and 4 GB RAM, N2D nodes could run Jibri for over half an hour while N1 nodes with the same or even better specs overloaded and crashed within minutes. If your CPU is not fast enough, frames will start buffering in RAM - it’s as simple as that, as far I understood it.
Shared vCPUs are a no-go for any serious workload on any provider, this should be obvious. Their performance is extremely inconsistent.
And year later, I arrived here for some answers, with 100s of question, why jibri is designed with so much of hacks.
I am goona try what you suggested @starkwiz
And I have some queries :
Oh my god, well done! Disabled ThirdPartyRequests and VideoBackground = recorded 12 minutes of video with 2 devices connected at 1022x1108 and Jibri used something like 800MB of RAM. Love you guys🚀
We’ve found the same issue on a local Kubernetes cluster.
It’s an Ubuntu 18.04 based cluster, with Jibri compiled 6 weeks ago from testing release.
Jibri was recording directly into a NFS folder.
In our case, the Kernel wasn’t flushing cache memory quickly enough so OOM_Killer got triggered. We fixed that situation by moving the recordings folder to a local folder of the node and then moving the recording to NFS.
This way, the Kernel is behaving properly and never fills the machine, despite RAM consumption is huge, anyway (Cache, not RSS).
What’s weird is that the exact same Jibri deployment works fine on other Kubernetes, so, maybe it’s related to something else (base Kernel, CPU power…).
Still investigating.
In any case, maybe you can try to increase cache pressure in your kernels to avoid filling up your memory: vm.vfs_cache_pressure=150 or 200
We have have tried setting
disableThirdPartyRequests: true, however it did not seem to resolve the issue unfortunately.