code-server: [Bug]: Heartbeat not being updated while user edits files or has terminal activity.
Is there an existing issue for this?
- I have searched the existing issues
OS/Web Information
- Web Browser: chrome, brave
- Local OS: linux, windows
- Remote OS: docker running debian (for example php:8.0.6-apache docker image)
- Remote Architecture: x86_64
code-server --version: v4.8.3 also tested against v4.9.1, and v4.7.3 4.8.3 977b853a1e162ab583aed64b1322d1515c57728c with Code 1.72.1
Steps to Reproduce
- Run code server and login in a web browser
- Continuously watch the heartbeat file in seperate window, to see when it was last updated (how old it is)
- Edit file and generally work in a open terminal window. Or just go do other things until heartbeat stops.
- After this do thins with code-server and VERY few times will get heartbeat to start updating again!
Expected
Heartbeat file $HOME/.local/share/code-server/heartbeat should have its time updated (touched) at least once every minute, while there is ANY activity. Essentially if a browser is connected to it, heartbeat should update. BUt it stops after a while.
Alternatively heartbeat could update on any auto-file save, menu activity, or terminal window activity.
Actual
After a period of time (could be anywhere from 15 to 45 minutes) the heartbeat stops updating.
At which point it will NOT start again even if user starts to edit existing open files, making changes, saving using ctrl-S, closing open file editing, or perform commands in terminal. Actual changes are being made, and confirmed both in the file system, or even from looking via the code-server terminal, and yet heartbeat does not reflect that the user has performed some action, and as such is NOT idle.
The heartbeat only starts updating again, if a new file is opened, or browser tab (google chrome, or brave) is refreshed, or re-opened. And then only for a VERY short time (up to 5 minutes or so) before it stops again, even though user is active and changes are being made (as above). Sometimes with only a single heartbeat update being seen.
This is BAD as the idle-timeout system we use relies on the heartbeat (see below). It can’t use web requests as there no new requests, as actual changes were made though open connections. Be it via terminal or file editing.
Logs
No log output has been seen in any log without verbose turned on. However with the length of times involved the size of the verbose log is HUGE! Especially as the ‘File Watcher’ also watches the file it logs to for the File Watching while verbose! EG: .local/share/code-server/logs/…/remoteagent.log
Looking at around the time the heartbeat stops 2023-01-13_15:14:54
And removing any line with the pattern File Watcher.*share/code-server
I get the following from remoteagent.log…
[2023-01-13 15:12:39.157] [remoteagent] [trace] [File Watcher (parcel)] [ADDED] /var/www/20230113-1512-01-
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)] [ADDED] /var/www/20230113-1512-01-/var
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)] [ADDED] /var/www/20230113-1512-01-/var/www
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)] [ADDED] /var/www/20230113-1512-01-/var/www/.local
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)] [ADDED] /var/www/20230113-1512-01-/var/www/.local/share
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)] >> normalized [ADDED] /var/www/20230113-1512-01-
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)] >> normalized [ADDED] /var/www/20230113-1512-01-/var
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)] >> normalized [ADDED] /var/www/20230113-1512-01-/var/www
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)] >> normalized [ADDED] /var/www/20230113-1512-01-/var/www/.local
[2023-01-13 15:12:39.158] [remoteagent] [trace] [File Watcher (parcel)] >> normalized [ADDED] /var/www/20230113-1512-01-/var/www/.local/share
[2023-01-13 15:15:00.644] [remoteagent] [trace] ExtensionManagementService.refreshReportedCache
[2023-01-13 15:15:00.644] [remoteagent] [trace] ExtensionManagementService.refreshControlCache {"malicious":[],"deprecated":{}}
[2023-01-13 15:16:00.640] [remoteagent] [trace] Started scanning user extensions
[2023-01-13 15:16:00.642] [remoteagent] [trace] Scanned user extensions: 0
I seriously doubt this is of any use.
Screenshot/Video
No response
Are you accessing code-server over HTTPS?
- I am using HTTPS.
Notes
I have recently had complaints from students and staff saying the Heartbeat has become flaky. The heartbeat is used by our docker system to determine when a user is no longer using the development environment they are running so it can be closed down after an hour of non-use.
I am finding the heartbeat stopped updating, even though files are still being edited or the terminal is being used. The heartbeat does start to work again if a different file is opened for editing, or the whole browser tab is refreshed. But not from actual use with files being edited or the terminal being used. The heart beat file, which normally updates once a minute just stops being updated.
Heartbeat is VITAL for the determination of the automatic shutdown of docker instances, as code-server holds web connections open, rather than making new requests (as jupyter notebook does). The Idle checker checks… 1/ user started or checked the status of docker instance from a control panel 2/ there has been any web request logged by the docker ingress service for that user on the code-server port (8443) or on other ports user may used in their docker environment (apache ports). 3/ the code-server heartbeat file was last updated.
System is basically as originally described in https://github.com/coder/code-server/issues/792#issuecomment-997541980 Only with the addition of web request checks being added, as per… https://github.com/coder/code-server/issues/1274#issuecomment-1011712084 to the ‘idle timeout’ check script that runs every ten minutes (it looks back and records in another timestamp file) and user web requests for specific ports in the last 15 minutes of ingress logs.
The heartbeat may be related to the refactor of the heartbeat in… https://github.com/coder/code-server/issues/5544
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 22 (10 by maintainers)
I have tested 4.10.0. Ran my heartbeat check and waited for it to time out.
Same as before, heartbeat only restarted on editing a new file, or reloading browser tab. all other actions did not reset the heartbeat, even though I could edit and save and file already being edited. and do interactive commands in a terminal.
Now I could add a ‘prompt command’ to reset heartbeat on terminal command activity, but I should not need to!
NOTE: previously heartbeat file will be updated once a minute, as long as the browser was connected. This is the behaviour which I actually expect.
@jsjoeio My thinking is either the Node code we call to get active connections is somehow wrong:
https://github.com/coder/code-server/blob/134e9b40b7e1de41ac763d871acbaf1307503e0a/src/node/routes/index.ts#L35-L42
Or the heartbeat code itself is buggy. Maybe we can start with some debug logging that tells us when the heartbeat is running, when it schedules itself to run, and when it stops and why it thinks it should stop (“stopping heartbeat because server reports there are no connections” for example).