srt: [BUG] CUDTGroup::recv() may cause the "No room" issue

Describe the bug

m_Positions in CUDTGroup::recv() should not consider the listen socket, otherwise it may led to that the listen socket becomes HORSE, and m_RcvBaseSeqNo becomes 0, then group->updateReadState() never set group to readable because seqcmp(1676307874, m_RcvBaseSeqNo) always return negative, then “No room to store incoming packets”.

To Reproduce SRT broadcast video to server (live mode) and the network is very bad.

Expected behavior not use the listen socket as HORSE, no “No room”

Screenshots

image

Desktop (please provide the following information):

  • OS: Linux
  • SRT Version / commit ID: master/481e7f7

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 40 (19 by maintainers)

Most upvoted comments

Ok, I found another lock disorder. Lock on CUDTSocket::m_ControlLock is probably unnecessary, and it may cause a deadlock as it orders before m_ConnectionLock. Fortunately, m_ConnectionLock orders before m_GlobControlLock, which’s locking is actually absolutely necessary. I updated the branch, you might want to use it for further testing.

Ok, would you be able to help me with testing it?

The same branch on my repo replica: dev-test-spurious-group-epoll. I have only made a rough test with non-blocking mode and pause-5s-resume on the sender, at least it stops and then continues reading.

Ah ok. I I’ll need to review this again, but I think I understand the problem.

Might be. I just need to track which socket is which here. Don’t forget that this listener socket is there for a reason - the group reader must also track the listener for any incoming new connection, if it’s on the listener side. This is in order to interrupt the waiting function in case when none of the existing connections provide data, but the newly connected socket would. Therefore the newly connected socket causes setting SRT_EPOLL_UPDATE event on the listener socket, that should interrupt the waiting and add this newly accepted socket to the pool.

yes, keep repeating:

08:32:41.303176/SRT:Listener D:SRT.gr: group/recv: ALL LINKS ELEPHANTS. Re-polling.
08:32:41.303187/SRT:Listener D:SRT.gr: group/recv: Reviewing member sockets for polling
08:32:41.303197/SRT:Listener D:SRT.gr: group/recv: E(7) @56331898[READ] @56331896[READ] @56331895[READ]  --> EPOLL/SWAIT
08:32:41.303207/SRT:Listener D:SRT.ea: srt_epoll_update_usock: UPDATED E7 for @56331898 +
08:32:41.303217/SRT:Listener D:SRT.br: isRcvDataReady: packet NOT extracted.
08:32:41.303233/SRT:Listener D:SRT.ei: epoll/update: @56331898 +[W]: E7 TRACKING: @56331895:[R][E] @56331896:[R][E] @56331898:[R][E] @56331908:[R][^U]  NOT updated: no changes
08:32:41.303247/SRT:Listener D:SRT.ei: epoll/update: @56331898 +[W]: E8 TRACKING: @56331895:[W][E] @56331896:[W][E] @56331898:[W][E]  NOT updated: no changes
08:32:41.303260/SRT:Listener D:SRT.ea: srt_epoll_update_usock: UPDATED E7 for @56331896 +
08:32:41.303270/SRT:Listener D:SRT.br: isRcvDataReady: packet NOT extracted.
08:32:41.303284/SRT:Listener D:SRT.ei: epoll/update: @56331896 +[W]: E7 TRACKING: @56331895:[R][E] @56331896:[R][E] @56331898:[R][E] @56331908:[R][^U]  NOT updated: no changes
08:32:41.303296/SRT:Listener D:SRT.ei: epoll/update: @56331896 +[W]: E8 TRACKING: @56331895:[W][E] @56331896:[W][E] @56331898:[W][E]  NOT updated: no changes
08:32:41.303307/SRT:Listener D:SRT.ea: srt_epoll_update_usock: UPDATED E7 for @56331895 +
08:32:41.303316/SRT:Listener D:SRT.br: isRcvDataReady: packet NOT extracted.
08:32:41.303331/SRT:Listener D:SRT.ei: epoll/update: @56331895 +[W]: E7 TRACKING: @56331895:[R][E] @56331896:[R][E] @56331898:[R][E] @56331908:[R][^U]  NOT updated: no changes
08:32:41.303343/SRT:Listener D:SRT.ei: epoll/update: @56331895 +[W]: E8 TRACKING: @56331895:[W][E] @56331896:[W][E] @56331898:[W][E]  NOT updated: no changes
08:32:41.303354/SRT:Listener D:SRT.ea: E7 rdy=1: @56331908:[R]  TRACKED: @56331895:[R][E] @56331896:[R][E] @56331898:[R][E] @56331908:[R][^U] 
08:32:41.303366/SRT:Listener D:SRT.gr: group/recv: 1 RDY: @56331908:[R] 
08:32:41.303377/SRT:Listener D:SRT.br: isRcvDataReady: packet NOT extracted.
08:32:41.303387/SRT:Listener D:SRT.br: isRcvDataReady: packet NOT extracted.
08:32:41.303397/SRT:Listener D:SRT.br: isRcvDataReady: packet NOT extracted.
08:32:41.303406/SRT:Listener D:SRT.gr: group/recv: NOT extracted anything - checking for a need to kick kangaroos
08:32:41.303416/SRT:Listener D:SRT.gr: group/recv: ALL LINKS ELEPHANTS. Re-polling.

the @56331908 is the listener socket

The test wasn’t conducted against the latest version and there were many changes around here. I checked the procedure that extracts sockets, which are later considered to be added to the m_Positions container and they are extracted out of member sockets that are read-ready (and listener socket is not a member socket, it’s just added to epoll container to track connecting new links within the group).

Epoll is likely subscribed to a listener socket, but the procedure likely doesn’t check if the socket that reported read-readiness is a group member. Listener socket definitely isn’t a group member.