foundryvtt-docker: Node segfault (error 139) in container release v11.x.x

Bug description

I can neither migrate old data nor can I start a new world. I always get the same error:

{"code":"ECOMPROMISED","errno":-2,"level":"error","message":"ENOENT: no such file or directory, stat '/data/Config/options.json.lock'","path":"/data/Config/options.json.lock","stack":"Error: ENOENT: no such file or directory, stat '/data/Config/options.json.lock'","syscall":"stat","timestamp":"2023-05-26 17:39:43"}
{"level":"error","message":"A fatal error occurred while trying to start the Foundry Virtual Tabletop server: Foundry VTT cannot start in this directory which is already locked by another process.","stack":"Error: A fatal error occurred while trying to start the Foundry Virtual Tabletop server: Foundry VTT cannot start in this directory which is already locked by another process.\n    at _acquireLockFile (file:///home/foundry/resources/app/dist/init.mjs:1:4799)\n    at async _initializeCriticalFunctions (file:///home/foundry/resources/app/dist/init.mjs:1:2593)\n    at async Module.initialize (file:///home/foundry/resources/app/dist/init.mjs:1:1564)","timestamp":"2023-05-26 17:41:11"}
{"level":"error","message":"A fatal error occurred while trying to start the Foundry Virtual Tabletop server: Foundry VTT cannot start in this directory which is already locked by another process.","stack":"Error: A fatal error occurred while trying to start the Foundry Virtual Tabletop server: Foundry VTT cannot start in this directory which is already locked by another process.\n    at _acquireLockFile (file:///home/foundry/resources/app/dist/init.mjs:1:4799)\n    at async _initializeCriticalFunctions (file:///home/foundry/resources/app/dist/init.mjs:1:2593)\n    at async Module.initialize (file:///home/foundry/resources/app/dist/init.mjs:1:1564)","timestamp":"2023-05-26 17:44:36"}

The data is stored on the host system as a bind volume under a directory. All was well with V10.

I attempted to ‘reset’ all the data as well, by stopping the container, moving the data to a new folder, re-creating the old folder completely empty, and creating a brand new world. The error logs above are from that last attempt.

I have tried removing the options.json.lock directory (isn’t this supposed to be a file, not a directory, btw?), recycling the container, to no avail. I suspect this may be a bug in foundry itself, but I’m not certain, and figured anyone else running into this issue from this container may find their way here…

Steps to reproduce

  1. Start a new container instance with a bind volume to hold the container data
  2. Create a new world
  3. Attempt to launch the world

Expected behavior

I expect the world to launch without error.

Container metadata

com.foundryvtt.version = "11.299"
org.opencontainers.image.authors = "markf+github@geekpad.com"
org.opencontainers.image.created = "2023-05-26T01:33:20.124Z"
org.opencontainers.image.description = "An easy-to-deploy Dockerized Foundry Virtual Tabletop server."
org.opencontainers.image.licenses = "MIT"
org.opencontainers.image.revision = "349bc278fa92049dd2b480b322fc30a0842221fb"
org.opencontainers.image.source = "https://github.com/felddy/foundryvtt-docker"
org.opencontainers.image.title = "foundryvtt-docker"
org.opencontainers.image.url = "https://github.com/felddy/foundryvtt-docker"
org.opencontainers.image.vendor = "Geekpad"
org.opencontainers.image.version = "11.299.0"

Relevant log output

No response

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 51 (21 by maintainers)

Commits related to this issue

Most upvoted comments

I was having this problem and your solution worked for my system.

Problem

When upgrading or launching a world, foundry crashed with the follow logs:

Segmentation fault (core dumped)
Launcher | 2023-05-27 02:44:37 | [error] Node process exited with code 139

Solution

Exactly as described here: #697 (comment)

My system

$ uname -a
Linux raspberrypi 5.15.84-v7l+ #1613 SMP Thu Jan 5 12:01:26 GMT 2023 armv7l GNU/Linux
$ cat /etc/issue.net
Raspbian GNU/Linux 11
$ grep Model /proc/cpuinfo
Model           : Raspberry Pi 4 Model B Rev 1.5

Exact same issue for me, except here was my System:

$ uname -a
Linux DietPi 6.1.21-v7+ #1642 SMP Mon Apr  3 17:20:52 BST 2023 armv7l GNU/Linux
$ cat /etc/issue.net
Debian GNU/Linux 12
$ grep Model /proc/cpuinfo
Model           : Raspberry Pi 2 Model B Rev 1.1

Thank you @felddy!

Please let me know how this works. If it succeeds I can add this file to the repo’s patch library for others to use.

This fixed the segfault I was getting on a pi4, thank you!

(to be clear, I applied the changes to alpine, not the debian branch.)

As a workaround for this issue you can try the following:

  1. Create a container_patches directory in your data mount. e.g., data/container_patches.

  2. In this directory create a file called: patch-issue-697.sh with the following contents:

    #!/bin/ash
    
    # Issue 697 Fix
    # =====================
    
    PATCH_DOC_URL="https://github.com/felddy/foundryvtt-docker/issues/697"
    PATCH_NAME="Fix for issue #697"
    
    log "Applying \"${PATCH_NAME}\""
    log "See: ${PATCH_DOC_URL}"
    
    apk add g++ make python3
    cd resources/app
    npm install classic-level --build-from-source
    cd -
    
  3. Set the environment variable: CONTAINER_PATCHES=/data/container_patches in your container’s configuration.

  4. Start the container using the standard release image: felddy/foundryvtt:release

In your container logs you will see lines similar to this after Foundry is installed but before it starts:

...
foundryvtt-foundry-1  | Entrypoint | 2023-06-19 17:51:50 | [info] Using CONTAINER_PATCHES: /data/container_patches
foundryvtt-foundry-1  | Entrypoint | 2023-06-19 17:51:50 | [info] Container patches directory detected.  Starting patch application...
foundryvtt-foundry-1  | Entrypoint | 2023-06-19 17:51:50 | [info] Sourcing patch from file: /data/container_patches/issue-697.sh
foundryvtt-foundry-1  | Entrypoint | 2023-06-19 17:51:50 | [info] Applying "Fix for issue #697"
foundryvtt-foundry-1  | Entrypoint | 2023-06-19 17:51:50 | [info] See: https://github.com/felddy/foundryvtt-docker/issues/697
foundryvtt-foundry-1  | fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/main/aarch64/APKINDEX.tar.gz
foundryvtt-foundry-1  | fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/community/aarch64/APKINDEX.tar.gz
foundryvtt-foundry-1  | (1/29) Installing libstdc++-dev (12.2.1_git20220924-r10)
foundryvtt-foundry-1  | (2/29) Installing zstd-libs (1.5.5-r4)
foundryvtt-foundry-1  | (3/29) Installing binutils (2.40-r7)
foundryvtt-foundry-1  | (4/29) Installing libgomp (12.2.1_git20220924-r10)
foundryvtt-foundry-1  | (5/29) Installing libatomic (12.2.1_git20220924-r10)
...
foundryvtt-foundry-1  | Entrypoint | 2023-06-19 17:52:11 | [info] Completed file patching.
...

This should apply the steps as documented in the associated issue here:

Please let me know how this works. If it succeeds I can add this file to the repo’s patch library for others to use.

Please let me know how this works. If it succeeds I can add this file to the repo’s patch library for others to use.

I too can confirm that this works on my Raspberry Pi 4 and solves the segfault 🥳 I will note that this increases the startup time of the container to a little over two minutes on the Pi 4 hardware due to the compilation step, perhaps a useful disclaimer to mention when publishing the patch.

Thanks for all the efforts to resolve this issue!

@mcdonnnj - not yet. Work has been killing me, and I had a bit of a rough zfs pool migration that JUST finished on the home lab. I’ve been utterly exhausted and why I haven’t participated after I booted FoundryVTT-docker from the ARM64 RPi4 over to the x86 server LOL - I will try to try it tonight, but not making promises*, my wife just went back into the hospital this morning and I’m running the house as well…

Thanks for testing. I think this is the most telling test results so far. I’ve got a couple problems to overcome now:

  1. I need to do some thinking about how to run this down further despite not being able to reproduce it yet.
  2. The FoundryVTT developers are anti-container and do not support containerized installs, or bug reports related to them.

I’ll keep working on this. Thank you again for the test help.

So, to be absolutely certain, I wanted to try running things in a barebones container, install Foundry V11 manually and see if it would still segfault. The short version is that it works just fine this way. This might indicate it might not be a Foundry bug after all, or at least there is some weird interaction going on with the way the container is set-up as I got it to work in another container configuration on the same hardware.

So here is what I did:

  • I pulled the latest official Ubuntu docker image (since I know how to work with that)
  • Ran the image with docker run with just port 30000 mapped and dropping straight into a bash shell, no other options
  • Installed only the following tools: sudo, wget, curl, libssl-dev and installed nodejs 18 using the official install script
  • I then followed the install instructions here https://foundryvtt.com/article/installation/ to the letter and ran Foundry
  • Installed dnd5e system, created a test world and launched it
  • No segfault!

So its definitely possible to run Foundry V11 in a container on a Raspberry PI 4… not sure if this makes it easier or harder now, but I feel this takes us a step closer to a solution…

I was having this problem and your solution worked for my system.

Problem

When upgrading or launching a world, foundry crashed with the follow logs:

Segmentation fault (core dumped)
Launcher | 2023-05-27 02:44:37 | [error] Node process exited with code 139

Solution

Exactly as described here: https://github.com/felddy/foundryvtt-docker/issues/697#issuecomment-1597821895

My system

$ uname -a
Linux raspberrypi 5.15.84-v7l+ #1613 SMP Thu Jan 5 12:01:26 GMT 2023 armv7l GNU/Linux
$ cat /etc/issue.net
Raspbian GNU/Linux 11
$ grep Model /proc/cpuinfo
Model           : Raspberry Pi 4 Model B Rev 1.5

A Debian version of the container is now building:

When/If it completes successfully it can be pulled with felddy/foundryvtt:testing-issue-697.

Noe: I had to drop arm/v6 support for now as the Debian base image did not exist for that platform.

Please test an let me know if this temporary work-around works.

@felddy This container unfortunately doesn’t start, here’s the error:

v11-foundry-1  | Entrypoint | 2023-06-13 17:07:17 | [info] Starting felddy/foundryvtt container v11.301.0
v11-foundry-1  | Entrypoint | 2023-06-13 17:07:17 | [info] No Foundry Virtual Tabletop installation detected.
v11-foundry-1  | Entrypoint | 2023-06-13 17:07:17 | [info] Using CONTAINER_CACHE: /data/container-cache
v11-foundry-1  | Entrypoint | 2023-06-13 17:07:17 | [info] Installing Foundry Virtual Tabletop 11.301
v11-foundry-1  | Entrypoint | 2023-06-13 17:07:24 | [info] Preserving release archive file in cache.
v11-foundry-1  | Entrypoint | 2023-06-13 17:07:24 | [info] Not modifying existing installation license key.
v11-foundry-1  | Entrypoint | 2023-06-13 17:07:24 | [info] Setting data directory permissions.
v11-foundry-1  | Entrypoint | 2023-06-13 17:07:24 | [info] Starting launcher with uid:gid as foundry:foundry.
v11-foundry-1  | Launcher | 2023-06-13 17:07:24 | [info] Generating options.json file.
v11-foundry-1  | Launcher | 2023-06-13 17:07:24 | [info] Setting 'Admin Access Key'.
v11-foundry-1  | Launcher | 2023-06-13 17:07:25 | [info] Starting Foundry Virtual Tabletop.
v11-foundry-1  | node:internal/modules/cjs/loader:1338
v11-foundry-1  |   return process.dlopen(module, path.toNamespacedPath(filename));
v11-foundry-1  |                  ^
v11-foundry-1  |
v11-foundry-1  | Error: /usr/lib/arm-linux-gnueabihf/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by
/home/foundry/resources/app/node_modules/classic-level/prebuilds/linux-arm/node.napi.armv7.node)
v11-foundry-1  |     at Module._extensions..node (node:internal/modules/cjs/loader:1338:18)
v11-foundry-1  |     at Module.load (node:internal/modules/cjs/loader:1117:32)
v11-foundry-1  |     at Module._load (node:internal/modules/cjs/loader:958:12)
v11-foundry-1  |     at Module.require (node:internal/modules/cjs/loader:1141:19)
v11-foundry-1  |     at require (node:internal/modules/cjs/helpers:110:18)
v11-foundry-1  |     at load (/home/foundry/resources/app/node_modules/node-gyp-build/node-gyp-build.js:22:10)
v11-foundry-1  |     at Object.<anonymous> (/home/foundry/resources/app/node_modules/classic-level/binding.js:1:43)
v11-foundry-1  |     at Module._compile (node:internal/modules/cjs/loader:1254:14)
v11-foundry-1  |     at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)
v11-foundry-1  |     at Module.load (node:internal/modules/cjs/loader:1117:32) {
v11-foundry-1  |   code: 'ERR_DLOPEN_FAILED'
v11-foundry-1  | }
v11-foundry-1  |
v11-foundry-1  | Node.js v18.16.0
v11-foundry-1  | Launcher | 2023-06-13 17:07:27 | [error] Node process exited with code 1
v11-foundry-1 exited with code 0

@felddy I tried two more things (tell me when to stop):

  • Tried running the same steps as above, but this time with the official Alpine 3.18 container with NodeJS v18 or v20
  • In both cases I get the segmentation fault

So on the Raspberry PI 4 hardware, running Foundry v11 works with an Ubuntu based image but not with Alpine. Is there a way we could get a release of your container setup based on another distro than Alpine for this case or would it be possible for me to modify and build a version myself?

Thanks

Ok. That gives us more data. Thank you for helping run this down. I wish I could reproduce it myself.

I"ve got another idea. Let’s start up the last version of the image that was working, but request Foundry version 11.300 be installed.

So that would mean starting felddy/foundryvtt:10.291.1 and setting the environment variable FOUNDRY_VERSION to 11.300.

Normally this would work fine. You’d just have a mismatch between the container and Foundry which means you might not have access to specifying some options via environment variables. You will see a warning like this in the logs:

foundryvtt-mine-foundry-1  | Entrypoint | 2023-06-08 21:21:10 | [warn] FOUNDRY_VERSION has been manually set and does not match the container's version.
foundryvtt-mine-foundry-1  | Entrypoint | 2023-06-08 21:21:10 | [warn] Expected 10.291 but found 11.300
foundryvtt-mine-foundry-1  | Entrypoint | 2023-06-08 21:21:10 | [warn] The container may not function properly with this version mismatch.

I can confirm that this test works on my side.

If you still see the segfault then we know it’s something specific to Foundry v11. Then the debugging fun really begins. 😉

Alright, I just tried the above both with 11.300 in the 10.291.1 container and the just released 11.301 and in both instances the segfault is still thrown when trying to open the world. So its something specific to v11 and not the container…

Attach latest log - created a completely fresh image. Running on a raspberry pi4, btw - so this is ARM, rather than x86 based. Completely new container won’t launch any world when running in a container, but works fine running directly from my RPi4 host. I’m really leaning towards a container issue _foundryvtt-test_logs.txt 😦

Ok, so noticed something potentially of interest. Node’s dying with exit code 139 (SIGSEGV) - and when I went to launch using the same data config from the straight node FoundryVTT install, all worked fine. As part of trying to get it to run, though, I had to get GCC 3.4.29 - and since I’m running on Bullseye (Raspian) I had to upgrade to unstable (Bookworm) - then I could run FoundryVTT from node non-containerized perfectly fine. What GLIBCXX is available within the container? I’m trying to hit it with /bin/sh so I can check, but now that I migrated the default world, I’m just spinning and crashing the container lol. I’ll keep digging…

Edit - Here’s the actual log where I go to start the world (now post-migration):

FoundryVTT | 2023-05-27 02:44:37 | [warn] The module "gm-screen" contains "dependencies" which is deprecated in favor of "relationships.requires"
Deprecated since Version 10
Backwards-compatible support will be removed in Version 13
Segmentation fault (core dumped)
Launcher | 2023-05-27 02:44:37 | [error] Node process exited with code 139

Entrypoint | 2023-05-27 02:44:39 | [debug] Timezone set to: UTC
Entrypoint | 2023-05-27 02:44:39 | [info] Starting felddy/foundryvtt container v11.299.0
Entrypoint | 2023-05-27 02:44:39 | [debug] CONTAINER_VERBOSE set.  Debug logging enabled.

BTW, can I get an invite to the discord channel?

Edit Part Deux - My concentration on the lock file was just a symptom. What’s happening here, is that upon launch of a world, Node is crashing with exit code 139 (SIGSEGV), then the container restarts. Upon startup, the lock file still exists from the prior crash and that is what I was seeing originally. So the “digging” really needs to be understanding the first 139 exit code from Node…

Note - Node 18.16.0 is what I used from the non-containerized test, and that matches what’s claimed in the container as well. Really thinking this may be a glibc issue…

Ok then if that’s not you… good news then if you believe that miserly loves company.

CaydenSworn#3518 is reporting the same issue in the #troubleshooting forum. I’d keep an eye any resolutions they report.

I saw your issue that you opened to the foundryvtt project. Just be aware that there is a very strong container phobia with the support team. To the point that they will refuse to support any containerized foundry debugging. 🤷

I can’t think of any solutions or tests that you haven’t already run. Can you think of anything novel about your setup or configuration?