webrtc: Data channel fails to send packets at a frequently speed when works as a rtc server and connected by a browser.
Hi all,
I’m trying to test the bandwidth of the Data Channel based on your example
I didn’t change too many codes except the data channel callbacks. I tried to send as much as possible to the client when a data channel was created (will show in the codes later). And with a controllable delay, I tried send frequency at10Hz, 20Hz, 25Hz, 40Hz, 50Hz…etc
The problem is, when it comes to over than 40Hz, the data received in the browser mismatches the data I’m trying to send.
With the send frequency set to 40Hz (65535 bytes per send call), I’ve got the throughput jitter sharply and the browser receives fewer packets than I sent.
The gap increases when the send frequency becomes larger.
(Note here in the picture 192.168.1.2 is my webrtc server which runs the codes and sends data)
With the send frequency set to 25Hz (65535 bytes per send call as well), I’ve got the throughput smoothly and the browser receives all the data I sent.
I’ve got no idea what is going wrong, please if anyone here can help?
P.S. Below are my code snippets if they can be helpful.
I rewrite these codes https://github.com/webrtc-rs/webrtc/blob/20b59b7a40663c7546b105d7cd698ca4655a1fc4/examples/examples/data-channels/data-channels.rs#L117-L155 to the following
// Register data channel creation handling
peer_connection.on_data_channel(Box::new(move |d: Arc<RTCDataChannel>| {
let d_label = d.label().to_owned();
let d_id = d.id();
println!("New DataChannel {} {}", d_label, d_id);
// Register channel opening handling
Box::pin(async move {
use tokio::time;
let d2 = Arc::clone(&d);
let d_label2 = d_label.clone();
let d_id2 = d_id;
d.on_open(Box::new(move || {
println!("Data channel '{}'-'{}' open.", d_label2, d_id2);
let mut buf = Vec::with_capacity(65535);
let mut current = time::Instant::now();
let mut pkt_num = 0_usize;
unsafe {
buf.set_len(65535);
}
Box::pin(async move {
while d2.send(&buf.to_vec().into()).await.is_ok() {
pkt_num += 1;
time::sleep(time::Duration::from_millis(40)).await;
if current.elapsed().as_secs() > 1 {
println!("current send {} packets", pkt_num);
current = time::Instant::now();
}
}
})
}));
// Register text message handling
d.on_message(Box::new(move |_: DataChannelMessage| {
Box::pin(async move {})
}));
d.on_close(Box::new(|| std::process::exit(0)));
})
}));
And my Javascript codes are quite simple as shown below
let pc = new RTCPeerConnection({
iceServers: [
{
urls: 'stun:stun.l.google.com:19302'
}
]
})
var recv_bytes = 0;
var pkts_num = 0;
let interval = null;
let sendChannel = pc.createDataChannel('foo')
sendChannel.onclose = () => {
console.log('sendChannel has closed')
clearInterval(interval);
}
sendChannel.onopen = () => {
console.log('sendChannel has opened')
interval = setInterval(() => {
console.log("Recv " + pkts_num + " packets. Current rate: " + recv_bytes + " B/s");
recv_bytes = 0;
}, 1000);
}
sendChannel.onmessage = e => {
recv_bytes += e.data.byteLength
pkts_num += 1
// sendChannel.send(e.data)
}
pc.oniceconnectionstatechange = e => console.log(pc.iceConnectionState)
pc.onnegotiationneeded = e =>
pc.createOffer().then(d => {
pc.setLocalDescription(d)
fetch('/sdp', {
method: 'POST',
headers: {
'Content-Type': 'application/x-www-form-urlencoded'
},
body: "client=" + btoa(JSON.stringify(d))
}).then(response => response.text().then(response => {
pc.setRemoteDescription(new RTCSessionDescription(JSON.parse(atob(response))))
}))
}).catch(console.log)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 32 (24 by maintainers)
Commits related to this issue
- sctp: reduce AI lock contention (#363) As discussed in #360 the lock on the internal association is very contended in high send-bandwidth situations. This PR achieves two things: 1. Pull the marsh... — committed to webrtc-rs/webrtc by KillingSpark a year ago
- sctp: optimize packet marshalling (#364) As discussed in #360 the marshalling code has become a bottle neck in high bandwidth sending situations. I found two places that had a big effect on the perfo... — committed to webrtc-rs/webrtc by KillingSpark a year ago
- sctp: improve payload queue push performance (#365) As discussed in #360 Gathering packets to send is a big chunk of the work the Association::write_loop is doing while in a critical section. This... — committed to webrtc-rs/webrtc by KillingSpark a year ago
- sctp: limit the bytes in the PendingQueue by using a semaphore (#367) As discussed in #360 the pending queue can grow indefinitely if the sender writes packets faster than the association is able to ... — committed to webrtc-rs/webrtc by KillingSpark a year ago
The three PRs combined have this effect for me:
Current master
All three PRs combined
Still not at pion levels but the next big botlleneck that exists now seems to be the CRC32 impementation.
@HsuJv If you want to try this out: I pushed a branch with the current state of the three PRs merged here: https://github.com/KillingSpark/webrtc/tree/merged
Yup it’s lock contention. Specifically on the Mutex around the InternalAssociation struct.
Occupied time in readloop measures time used to process an SACK and in writeloop it’s gathering packets to send.
Taking time is just measuring the time it took to actually lock the mutex before doing the above operation.
It seems pretty clear to me that the SACK processing is stalled by the lock contention. The solution to this is probably non-trivial. Maybe instead of making the writeloop gather packets to send, the packets should be put into a channel either when they are queued and can be sent immediatly within the rwnd or when a SACK arrives that increases the rwnd?
Modified code for better understanding of the measurements above:
Edit: My current suspicion is, that the marshalling of the packets is the culprit, because it is done under while holding the lock. I think it is easy enough to fix this and maybe performance will be good enough without bigger changes to the architecture
TLDR for this whole issue:
Separately I noticed that the pending queue does not apply backpressure so if the sender continously sends faster than the connection is allowed to send this queue will grow indefinitely and in the long run cause an OOM.
Way forward
@rainliu @k0nserv First of all sorry for hijacking (and kinda spamming 😄) an issue on your repo I hope that’s ok. I got a bit carried away.
Do you have any objections to the changes I did in the write_loop (see comment directly above) ? Obviously the
.unwrap()
s need to be handled more gracefully.If not I’d prepare two PRs one for the changes in the write_loop and one for the optimizations to the marshal code. And probably a new issue for the pending queue not applying back pressure.
Yes it gets smoothly at around 130MB BTW, In my last comment, the number was 20-40 MB & 70 MB. I did a wrong calculation.
(My server always runs some heavy tasks, so it is expected that the performance is lower than my WSL)
Definitely should do that. I have some ideas for that as well (and prototyping shows it increases throughput even more) but I wanted to get these PRs merged before starting another one 😃
Edit: nevermind, found a good solution and couldn’t wait
Excellent research! Sounds like your changes are promising @KillingSpark. Please roll it up into PRs for review
Ok so my suspicion was right. Pulling the marshalling our from under the lock reduces the time the mutex is locked in total drastically…but… Even if we trick tokio into not running the read_loop and write_loop on the same thread we see something unfortunate:
537836 Bytes in 7789 microseconds is 69.050.712 Bytes/s and since the write_loop is not ideling, send performance is entirely bottlenecked by:
I can put together a PR that provides the behaviour above, which would allow optimizations on Packet::marshal to result in immediate throughput benefits
Doing a few low hanging optimizations on the marshaling code gets me to this throughput
95MByte/s is still not great but it’s something.
Changes to the write_loop
Didn’t test the UDP case, but I believe your numbers. The SCTP example you provided is very slow for me too, around 100is packets instead of your 76ish. In a release build I can get up to 900 packets though. That’s still way less than it probably should be, but goes to show how good the optimizer is 😃
A profile I did on a release build shows this:
This stack (37% of total execution time) is concerned with marshalling packets (20% of total execution time) and unmarshaling packets (6% of total execution time)
This stack (about 29%) is… blocking on a threadpool? Seems like tokio internals but maybe this is influenced by the usage of tokio in this library?
I performed this with Reliability::Rexmit 0, so the sender performance should not rely on the receiver performance
My logs look like this:
Send 71410 pkts Throughput: 42728820 Bytes/s, 652 pkts, 652 loops Send 68634 pkts Throughput: 41155980 Bytes/s, 628 pkts, 628 loops Send 71338 pkts Throughput: 42794355 Bytes/s, 653 pkts, 653 loops Send 68920 pkts Throughput: 39386535 Bytes/s, 601 pkts, 601 loops Send 68421 pkts Throughput: 41024910 Bytes/s, 626 pkts, 626 loops Send 69504 pkts Throughput: 41549190 Bytes/s, 634 pkts, 634 loops
All in all this suggest to me a few things:
My speculation is: For some reason tokio does not read packets fast enough from the socket, resulting in the processing code not being called often and causing many packets to be dropped by the kernel because the socket queue is overflowing.
That there is that much time spent on blocking on a threadpool seems fishy and could be related. Not sure though.
Edit:
A quick look into netstat confirms that the receive queue is overflowing. Just gotta figure out why the packets are not retrieved fast enough.
I had a look around in the code and maybe there is just high contention on the reassembly queue lock? That would maybe explain the blocking on parked threads?
It seems something is wrong within the
SCTP
I just wrote a simple POC to do a throughput test
And got logs like
whatever the
set_reliability_params
istrue
orfalse
But if I remove the
sctp
layer and use tokio udp socket directly, everything is ok