stunner: help - intermitent failures connecting to workers on `cloudretro` example on AWS + EKS + ALB
I’ve been building the cloudretro
example for a while on multiple kubernetes distributions without issues!
Now I’m trying to run this example on AWS setup (EKS + Fargate + ALB) and I’m getting some intermitent errors:
- Sometimes I’m able to connect to the workers, and other times I have timeouts
- I suspect it has something to do with the ICE candidates that are reported on the application - I’ve posted evidences below on the differences
Versions:
- I’m using both
stunner
andstunner-gateway-operator
versions0.16.0
(chart and app) - EKS
1.27
Based on the information below, what is the culprit here? Is there anything I can tweak server side to facilitate this discovery and ensure a successful connection on first attempt?
error attempt
The following errors were captured on the browser using hte Developer tools.
- Usually when I first reach the coordinator, I usually get the error
[rtcp] ice gathering was aborted due to timeout 2000ms
. - I notice that only one user candidate
02895ab7-03e2-4f4a-9afe-daa99822e2d5.local
gets reported and its not reachable (and should not be!)
Error console:
keyboard.js?v=5:128 [input] keyboard has been initialized
joystick.js?v=3:275 [input] joystick has been initialized
touch.js?v=3:304 [input] touch input has been initialized
socket.js?v=4:36 [ws] connecting to wss://home.company.com/ws?room_id=&zone=
socket.js?v=4:42 [ws] <- open connection
socket.js?v=4:43 [ws] -> setting ping interval to 2000ms
controller.js?v=8:79 [ping] <-> {http://worker.company.com:9000/echo: 9999}
rtcp.js?v=4:17 [rtcp] <- received coordinator's ICE STUN/TURN config: [{"urls":"turn:udp.company.com:3478","username":"user-1","credential":"fQvzu2pFOBxtW5Al"}]
rtcp.js?v=4:106 [rtcp] ice gathering
rtcp.js?v=4:120 [rtcp] <- iceConnectionState: checking
rtcp.js?v=4:100 [rtcp] user candidate: {"candidate":"candidate:1680066927 1 udp 2113937151 02895ab7-03e2-4f4a-9afe-daa99822e2d5.local 54853 typ host generation 0 ufrag rj2C network-cost 999","sdpMid":"0","sdpMLineIndex":0,"usernameFragment":"rj2C"}
rtcp.js?v=4:108 [rtcp] ice gathering was aborted due to timeout 2000ms
success attempt
After a couple of retries, we finally have success:
- Notice that now there are 3 user candidates, one of them is reachable (the one with the public IP)
rtcp.js?v=4:100 [rtcp] user candidate: {"candidate":"candidate:2537811000 1 udp 2113937151 5774c39d-3ada-44b9-b95f-89a858000ac4.local 54903 typ host generation 0 ufrag sRpm network-cost 999","sdpMid":"0","sdpMLineIndex":0,"usernameFragment":"sRpm"}
rtcp.js?v=4:100 [rtcp] user candidate: {"candidate":"candidate:3567609059 1 udp 1677729535 89.180.168.100 42221 typ srflx raddr 0.0.0.0 rport 0 generation 0 ufrag sRpm network-cost 999","sdpMid":"0","sdpMLineIndex":0,"usernameFragment":"sRpm"}
rtcp.js?v=4:100 [rtcp] user candidate: {"candidate":"candidate:4028349664 1 udp 33562623 10.0.22.42 35233 typ relay raddr
Full log of the success connection:
keyboard.js?v=5:128 [input] keyboard has been initialized
joystick.js?v=3:275 [input] joystick has been initialized
touch.js?v=3:304 [input] touch input has been initialized
socket.js?v=4:36 [ws] connecting to wss://home.company.com/ws?room_id=&zone=
socket.js?v=4:42 [ws] <- open connection
socket.js?v=4:43 [ws] -> setting ping interval to 2000ms
controller.js?v=8:79 [ping] <-> {http://worker.company.com:9000/echo: 9999}
rtcp.js?v=4:17 [rtcp] <- received coordinator's ICE STUN/TURN config: [{"urls":"turn:udp.company.com:3478","username":"user-1","credential":"fQvzu2pFOBxtW5Al"}]
rtcp.js?v=4:106 [rtcp] ice gathering
rtcp.js?v=4:120 [rtcp] <- iceConnectionState: checking
rtcp.js?v=4:100 [rtcp] user candidate: {"candidate":"candidate:2537811000 1 udp 2113937151 5774c39d-3ada-44b9-b95f-89a858000ac4.local 54903 typ host generation 0 ufrag sRpm network-cost 999","sdpMid":"0","sdpMLineIndex":0,"usernameFragment":"sRpm"}
rtcp.js?v=4:100 [rtcp] user candidate: {"candidate":"candidate:3567609059 1 udp 1677729535 89.180.168.100 42221 typ srflx raddr 0.0.0.0 rport 0 generation 0 ufrag sRpm network-cost 999","sdpMid":"0","sdpMLineIndex":0,"usernameFragment":"sRpm"}
rtcp.js?v=4:100 [rtcp] user candidate: {"candidate":"candidate:4028349664 1 udp 33562623 10.0.22.42 35233 typ relay raddr 89.180.168.178 rport 42221 generation 0 ufrag sRpm network-cost 999","sdpMid":"0","sdpMLineIndex":0,"usernameFragment":"sRpm"}
rtcp.js?v=4:113 [rtcp] ice gathering completed
rtcp.js?v=4:120 [rtcp] <- iceConnectionState: connected
rtcp.js?v=4:123 [rtcp] connected...
I appreciate any help on this matter!
About this issue
- Original URL
- State: open
- Created 9 months ago
- Comments: 15 (9 by maintainers)
CC @bbalint105