geckos.io: Geckos 2.0 http server hangs after a while

Hey, there! I recently upgraded to geckos 2.0. Everything has been working swimmingly, except for one mysterious problem. After about two days of runtime, my geckos server starts timing out all requests. It sits behind an NGINX proxy, so the first error I see is:

2021/11/09 15:44:56 [error] 15785#15785: *5965624 upstream timed out (110: Connection timed out) while reading response header from upstream, request: "POST /geckos/.wrtc/v2/connections HTTP/1.1", upstream: "http://127.0.0.1:3030/.wrtc/v2/connections"

eventually followed by the similar:

2021/11/09 16:06:34 [error] 15785#15785: *5969357 upstream timed out (110: Connection timed out) while connecting to upstream, request: "POST /geckos/.wrtc/v2/connections HTTP/1.1", upstream: "http://127.0.0.1:3030/.wrtc/v2/connections"

I haven’t found any correlated errors coming from the geckos server itself. It just suddenly starts hanging and won’t reply anymore 😦 CPU and MEM usage seem normal.

Any idea how best to dig into this?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 45 (31 by maintainers)

Most upvoted comments

@marcwehbi Thank you for the gdb output, it looks like a regression introduced in libdatachannel v0.15.4. I refactored the transports teardown and it looks like it might make the PeerConnection deadlock on close. The timings triggering the deadlock must happen on one machine and not on the other for some reason.

Haven’t seen any deadlocks in the last 4 days! Looks fixed!

@murat-dogan @paullouisageneau @yandeu thank you all! I just deployed geckos v2.1.4. I was seeing deadlock about once per day before, so I’ll wait a couple of days, see what happens, then close this bad boy.

This should be fixed in libdatachannel v0.15.6, this is the PR to update node-datachannel: https://github.com/murat-dogan/node-datachannel/pull/64

@paullouisageneau woooooo!!! Out of curiosity, what part of the trace was the clue?

@bennlich In the trace, a thread waits for a lock somewhere in rtc::impl::PeerConnection::closeTransports()::{lambda()#1}::operator()() while another waits for the first one to finish in rtc::impl::Processor::join(). There is not debug info, but given the scenario, it appears the lock is related to callback synchronization and rightfully held by the second thread. The mistake was that callbacks were reset at the wrong place in closeTransports(), creating the deadlock risk.

@murat-dogan Thanks!

I also just released geckos v2.1.4

@paullouisageneau Thanks a lot 👍🏻😊🥳

Just want to share an automated script to install the geckos example on ubuntu 20.04 (AWS EC2).
It will not solve this issue. But maybe it helps anyways 😃


Security Group (Firewall)

Protocol Port range Source
UDP 1024 - 65535 0.0.0.0/0
TCP 22 MY-IP/24
TCP 3000 0.0.0.0/0

Installation Script

The name of the user is ubuntu.

#!/bin/bash

# tested on ubuntu 20.04 / t3a.nano

sudo apt update && \
sudo apt upgrade -yq && \
sudo apt install cmake -yq && \

# Node.js 16.x (https://github.com/nodesource/distributions/blob/master/README.md#deb)
curl -fsSL https://deb.nodesource.com/setup_16.x | sudo -E bash - && \
sudo apt-get install -y nodejs && \
sudo npm install -g npm@8.1.4 && \

# Install pm2
sudo npm install -g pm2@latest && \

# Install gitget
sudo npm install -g gitget@latest && \

# Navigate home
cd /home/ubuntu && \

# Download Repository
sudo -u ubuntu gitget geckosio/simple-chat-app-example#httpServer && \

# Install
cd simple-chat-app-example && \
sudo -u ubuntu npm install && \

# PM2
sudo -u ubuntu pm2 start npm -- run serve && \
sudo -u ubuntu pm2 save && \
sudo env PATH="$PATH:/usr/bin" /usr/lib/node_modules/pm2/bin/pm2 startup systemd -u ubuntu --hp /home/ubuntu && \
sudo -u ubuntu pm2 save && \

# Finish
cd ../ && \
sudo -u ubuntu echo "done" | sudo tee status.txt && \
sudo shutdown -r now

I’ve just swapped to Twilio’s STUN server, will see in an hour or so if it freezes again. If it does im just gonna destroy the droplet and restore it from the EU image to see if maybe its some server configuration i overlooked.