jibri: Ffmpeg eats all the memory and crash within a minute - recording or streaming

Description

On jitsi when I start a recording or a streaming session, in less than a minute the recording/stream will stop and my whole server become slow and unresponsive.

With top, I could pin the culprit: ffmpeg. It eats away all the memory very quickly. In less than a minute my 8GB are filled.

You can find attached the log of jibri when I tried a streaming session. Nothing stands out to me. I stopped the streaming after 15 seconds and ffmpeg was already at 40% memory.

Also if I stop completely prosody, jicofo, jvb and jibri and if I log as a jibri user and starts ffmpeg by myself, using the command I found in log.0.txt, I get the same issue, the CPU shoot to 150% and the memory keeps growing. I have to kill ffmpeg before it saturates the memory.

ffmpeg -y -v info -f x11grab -draw_mouse 0 -r 30 -s 1280x720 -thread_queue_size 4096 -i :0.0+0,0 -f alsa -thread_queue_size 4096 -i plug:bsnoop -acodec aac -strict -2 -ar 44100 -c:v libx264 -preset veryfast -maxrate 2976k -bufsize 5952k -pix_fmt yuv420p -r 30 -crf 25 -g 60 -tune zerolatency -f flv rtmp://a.rtmp.youtube.com/live2/aaa

If I remove every parameters related to sound in this ffmpeg command line, so removing -f alsa -thread_queue_size 4096 -i plug:cloop -acodec aac, then the memory saturation issue goes away. Memory usage is stable. So It clearly seems to be related to the sound. How can I debug this kind of issue ?

Possible Solution


Steps to reproduce


Environment details

Ubuntu 16, followed the instruction on github

lsmod | grep snd_aloop
snd_aloop              24576  0
snd_pcm               106496  1 snd_aloop
snd                    81920  3 snd_aloop,snd_timer,snd_pcm

jibri@JibriTestSrv:/root$ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: Loopback [Loopback], device 0: Loopback PCM [Loopback PCM]
  Subdevices: 8/8
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
  Subdevice #7: subdevice #7
card 0: Loopback [Loopback], device 1: Loopback PCM [Loopback PCM]
  Subdevices: 8/8
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
  Subdevice #7: subdevice #7

browser.0.txt log.0.txt ffmpeg.0.txt asoundrc.txt

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 103 (7 by maintainers)

Most upvoted comments

Our only real solution was to increase vCPU’s number. Increasing from 4 to 8 CPU’s was the fix we used last time. It seems as if ffmpeg begins eating memory when the CPU’s are not giving enough power to it…

The main rationale for why we use chrome as a compositor for recording with Jibri is that it’s the best method we have to go from multiple WebRTC streams of audio and video to a single video with one audio and video stream. Any recorder will need to composite the videos, chose the active speaker, mix the audio, etc. Chrome happens to already do this, and the jitsi-meet client is custom-built for this job, so re-using it for recording has been the best method without needing to support a whole separate client. Would it be possible to do in a separate client? Absolutely, but then said client would need to be regularly updated when new features of Jitsi or Chrome dropped. So, it’s a reasonable question to ask why it’s designed this way, but the short answer is that with a small team, this was our best answer.

Thanks for the detailed explanation @starkwiz Very helpful.

Do you mind sharing your ffmpeg script? I’d like to try out your suggestions on a 2 vCPU Hetzner instance.

/usr/local/bin/ffmpeg

#!/bin/bash
echo ffmpeg in $0 #Comment this line after making sure, that running ffmpeg, points to this script.
ARGS=$@
ARGS=$(echo $ARGS | sed 's/-tune zerolatency/-tune zerolatency -vf scale=854x480/') #Scale Video to 854x480
exec /usr/bin/ffmpeg $ARGS

Make sure to update permissions for ffmpeg as below. chmod 755 /usr/local/bin/ffmpeg

Restart jibri services, so that it picks up ffmpeg command from new location.

systemctl stop jibri
systemctl stop jibri-xorg
systemctl start jibri

Our only real solution was to increase vCPU’s number. Increasing from 4 to 8 CPU’s was the fix we used last time. It seems as if ffmpeg begins eating memory when the CPU’s are not giving enough power to it…

Wow, thank you so much for pointing out that, it’s CPU that is causing the FFMPEG to use more RAM. It completely makes sense as well. Because if CPU is insufficient, FFMPEG just keeps putting frames in queue to process and uses buffer for the same and as it grows, so does the ram usage and finally it breaks when it reaches the max memory capacity.

I’ve been struggling to solve this issue for my setup. The video quality isn’t a concern, so I tweaked the ffmpeg encoding options until I figured, that it doesn’t hit above 95% CPU Usage consistently. My setup involves making this work properly with just 2 vCPU’s as more cores increases cost in AWS significantly, on top of that my region doesn’t have access to c5a instances.

I played around with many settings and noticed as below.

  1. Worked: Tried 720p HD recording with ultrafast preset for x264 which is lighter on CPU but file sizes are literally double of veryfast preset. Goes around 10 to 12 MB/ minute. This did help with getting stable recording but at very high cost on storage. If you have enough resources to maybe separately re-encode the MP4 files for smaller size then you can still benefit great quality.
  2. Partially Worked: The blur video background when a participant is using phone and the video is portrait mode appeared to be putting lot of pressure on FFMPEG and CPU. So, I disabled the blur video background for portrait mode participant.
  3. Not sufficient for 2 vCPU: Tried reducing frame rate from 30 to 24 for 1280x720 but it doesn’t seem to help much with lowering cpu usage.
  4. Best: Added scaling video filter with resolution 854x480 which still has the 16:9 ratio so there is no image cropping. This helped the most, as even at full screen video the max cpu utilisation is between 65% to 85% which I think is very awesome. also I consistently noticed that memory usage doesnt even go above 750 MB for whole jibri instance.
  5. Important: Using Google Chrome 78 as it’s much lighter on CPU and RAM usage compared to newer version. On Ubuntu 20.04, Google Chrome 78 and as well as any browser version less than the latest crashes, because of some changes in Ubuntu 20.04. So, sticking to Ubuntu 18.04 is a good idea at least for jibri recording.

I am able to do recording with this configuration on t3a.small aws instance which has just 2 vCPU and 2 GB RAM. I don’t think video recording can go any cheaper than this while maintaining the 16:9 ratio. And if you need HD or maybe Full HD video recording, 4 vCPU’s are required there is no way out of it, unless you go with ultrafast preset+lot of storage. Just putting a summary of above stuff for very stable setup of Jibri Recording while maintaining 16:9 aspect ratio.

  1. AWS Instance with just 2 vCPU and 2 GB RAM or any equivalent should do.
  2. OS: Ubuntu 18.04 x64
  3. Software: Google Chrome 78 + ChromeDriver 78 + JRE8
  4. Disable Video Background for Portrait Mode in Jitsi Meet Configuration
  5. Add scaling video filter in FFMPEG, to scale to 854x480

I didn’t re-compile jar file to modify ffmpeg settings instead of that I modified the parameters on the fly by creating a ffmpeg script, which seems to work flawlessly.

I hope this helps for anyone looking for solution to this issue.

Let me know if you have queries.

Here the same. VirtualServer with U18.04, 4 Cpu’s, 8GB RAM Very interesting is, if i set “disableThirdPartyRequests: true,” (Gravatar) in /etc/jitsi/meet/meet.mydomain.com-config.js my memory usage is stable.

Can anybody confirm this?

That release is almost a year old, i’d suggest you test with the latest image.

On GCP/GKE, we’ve had much better luck with AMD Epyc machines (N2D) than standard ones (N1 - Intel up to Skylake). We haven’t done extensive testing, but with 2 cores and 4 GB RAM, N2D nodes could run Jibri for over half an hour while N1 nodes with the same or even better specs overloaded and crashed within minutes. If your CPU is not fast enough, frames will start buffering in RAM - it’s as simple as that, as far I understood it.

Shared vCPUs are a no-go for any serious workload on any provider, this should be obvious. Their performance is extremely inconsistent.

And year later, I arrived here for some answers, with 100s of question, why jibri is designed with so much of hacks.

I am goona try what you suggested @starkwiz

And I have some queries :

  • Is there any relation between number of participants and amount of resource cpu/memory jibri components use ?
  • tangent question: Is there anyone working on alternative ways of recording ??

Here the same. VirtualServer with U18.04, 4 Cpu’s, 8GB RAM Very interesting is, if i set “disableThirdPartyRequests: true,” (Gravatar) in /etc/jitsi/meet/meet.mydomain.com-config.js my memory usage is stable.

Can anybody confirm this?

Oh my god, well done! Disabled ThirdPartyRequests and VideoBackground = recorded 12 minutes of video with 2 devices connected at 1022x1108 and Jibri used something like 800MB of RAM. Love you guys🚀

We’ve found the same issue on a local Kubernetes cluster.

It’s an Ubuntu 18.04 based cluster, with Jibri compiled 6 weeks ago from testing release.

Jibri was recording directly into a NFS folder.

In our case, the Kernel wasn’t flushing cache memory quickly enough so OOM_Killer got triggered. We fixed that situation by moving the recordings folder to a local folder of the node and then moving the recording to NFS.

This way, the Kernel is behaving properly and never fills the machine, despite RAM consumption is huge, anyway (Cache, not RSS).

What’s weird is that the exact same Jibri deployment works fine on other Kubernetes, so, maybe it’s related to something else (base Kernel, CPU power…).

Still investigating.

In any case, maybe you can try to increase cache pressure in your kernels to avoid filling up your memory: vm.vfs_cache_pressure=150 or 200

Here the same. VirtualServer with U18.04, 4 Cpu’s, 8GB RAM Very interesting is, if i set “disableThirdPartyRequests: true,” (Gravatar) in /etc/jitsi/meet/meet.mydomain.com-config.js my memory usage is stable.

Can anybody confirm this?

We have have tried setting disableThirdPartyRequests: true, however it did not seem to resolve the issue unfortunately.