lpms: Transcoding hangs under load

While doing throughput testing hit situation where transcoding hangs (reproduced three times). It was transcoding one 10 minutes generated video to seven 720 renditions in ten streams simultaneously. Looking at output files, it completely transcoded most of the files, except some. nvidia-smi shows that Livepeer process still uses GPU, but output files (one not completed) not growing in size. There is stopped GCE instance on which I’ve saw this, I’ve left it intact. To reproduce:

  1. Start Ivan-gpu-p100 instance
  2. got to /home/dark/go-livepeer
  3. run bench.sh
  4. wait 12 minutes
  5. look at the /disk-1-temp directory - there will be 70 output files, most complete (same size), some smaller. and that smaller files not growing in size.

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 15 (15 by maintainers)

Most upvoted comments

Traced it down to this line:

https://github.com/FFmpeg/FFmpeg/blob/f7f4691f9f745f6a087879ab855dd65f9f16879d/libavcodec/nvdec.c#L162

It enters cuvidDestroyDecoder (function in Nvidia library) and never exits. So, this is either:

  1. Bug in Nvidia drivers
  2. FFmpeg messes something up
  3. Go runtime messes with threads/locks

For the next thing I’ll try Nvidia’s beta driver.

hangstack.txt