go-livepeer: CUDA_ERROR_NOT_PERMITTED: operation not permitted - Error number -1448234581 occurred
Describe the bug This is the second server I’ve received this error on. At first I thought it was an issue on my side, but now I belive something else is going on:
sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=84.17.50.98 LB: Transcode submitted for key=de278b92_0
[AVHWDeviceContext @ 0x7fc1ec335d00] cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed -> CUDA_ERROR_NOT_PERMITTED: operation not permitted
ERROR: decoder.c:313] Unable to open hardware context for decoding : Unknown error occurred
ERROR: decoder.c:348] Unable to open video decoder : Error number -1448234581 occurred
E0826 05:09:06.423332 1 ffmpeg.go:977] Transcoder Return : Unrecoverable state, restart process
I0826 05:09:06.423372 1 lb.go:192] manifestID=3463mrgx6zbfk6p3 seqNo=0 orchSessionID=de278b92 clientIP=84.17.50.98 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Stopping transcoder due to error for key=de278b92_0
I0826 05:09:06.423381 1 lb.go:122] manifestID=3463mrgx6zbfk6p3 seqNo=0 orchSessionID=de278b92 clientIP=84.17.50.98 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E LB: Deleted transcode session for key=de278b92_0
panic: Unrecoverable state, restart process
goroutine 3715866 [running]:
github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSeg(0xc000452420, {0x87c600, 0xc000277830}, {{0x897030, 0xc000d08b80}, {0x897030, 0xc000d08b80}}, 0xc000d08b40, 0xc0008a18c0)
/src/core/orchestrator.go:557 +0xb3d
github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop.func1()
/src/core/orchestrator.go:660 +0xfe
created by github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop
/src/core/orchestrator.go:632 +0x44f
Livepeer is running in Docker and the error crashes the node which instantly restarts right after.
To Reproduce Steps to reproduce the behavior:
Reproducing the error is difficult as it doesn’t seem to happen at any given time.
Desktop (please complete the following information):
- OS: Ubuntu
- Version 22.04.1 LTS
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 21 (20 by maintainers)
All the Livepeer streams you’ve posted are ~20 mins after the hour, the same time you receive test streams from Livepeer. If Titan’s test is testing your Ts ability to transcode 21 sessions at the same time you have another job running that seems like it might be the issue.
I would either lower your maxSessions for the pool (very unlikely to receive anywhere near 21 sessions) or try running just the O without the pool software running for a few days and see if the issue persists.
Because all the logs posted are from Livepeer test streams it’s not likely the session profile causing an issue as I previously suspected.