srt: IPE: ACK node overwritten message (impact unknown)

There’s a message appeared in one of tests:

11:54:57.501306/SRT:RcvQ:worker*E: SRT.c: IPE: ACK node overwritten when acknowledging 1979 (ack extracted: 0)

Impact unknown, although this is qualified as IPE, so it shouldn’t happen. This was during tests against 1.1.3 version (Makito) and it might come from the other side. Need to make sure that this kind of problem is never CAUSED by the 1.3 version client.

About this issue

Most upvoted comments

I made a short review of the container and procedure used for ACK-window and there are lots of potential improvements that could be done here.

This is implemented as a manually operated cycled array container, in which there are stored values of ACK journal and ACK sequence number. Not sure which is being searched for, but by the call I state it’s the journal number, as ACKACK sends only this value (although naming for “sequence” is not exactly unanimous there). This number is being searched linearly, although these numbers are definitely sorted by the fact that they are being increased with every ACK sent. The fact that “all older records than the found one” are being deleted results from that all records that order below the found one are deleted, while they are older because insertion order comes with the time order.

What I think shall be first improved here is that this should now rely on ordering (including the fact that journal number are cyclic numbers that use also the same range as sequence numbers). I wouldn’t do a binary search here anyway - this would make sense only for regular numbers, but here are cyclic numbers, which’s comparison is more costly, so it wouldn’t be too profitable. But a check prematurely whether the incoming sequence number is older than the oldest stored number (and also potentially newer as the newest one, being completely rogue) is perfectly possible, and can simply result in a silent rejection of the case.

The function would also have to change the API a bit so that the state that the node is “already in the past” (for 1/4 of the cycle below the lowest sequence) and “rogue” (for 3/4 of the cycle above the highest sequence) can be distinguished. Maybe it could be also reasonable to keep the last 4 nodes that were deleted “indirectly” so that the out-of-order ACKACKs can be identified.

Thanks for the investigation! Now the reason is clear.

The class CACKWindow is responsible for tracking ACK-ACKACK pairs. When ACK is sent, it stores a record with the sending time. When ACKACK is received, it estimates RTT and removes all records older than the current one.

E.g., ACKACK no. 10 arrives on the receiver, ACK records 1-10 are removed. ACKACK no 9 arrives out of order, there is no record, which results in the IPE message.

TODO

  • Fix ACK window to not remove old records on ACKACK handling. Jitter must affect RTTVar estimate.
  • Consider Drift tracer (blocked by #753)
  • Improve the search for an ACK record knowing that records are stored in order (find by offset).