python-matter-server: `CHIP Error 0x000000C1: Endpoint pool full`
matter-server | Starting server: matter-server --storage-path /data --log-level debug
matter-server | 2023-04-17 18:56:54 clavius matter_server.server.stack[1] INFO Initializing CHIP/Matter Controller Stack...
matter-server | 2023-04-17 18:56:54 clavius matter_server.server.stack[1] DEBUG Using storage file: /data/chip.json
matter-server | [1681757814.754032][1:1] CHIP:CTL: Setting attestation nonce to random value
matter-server | [1681757814.754132][1:1] CHIP:CTL: Setting CSR nonce to random value
matter-server | [1681757814.754885][1:1] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_kvs
matter-server | [1681757814.755063][1:1] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /data/chip_factory.ini
matter-server | [1681757814.755144][1:1] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /data/chip_config.ini
matter-server | [1681757814.755205][1:1] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /data/chip_counters.ini
matter-server | [1681757814.755381][1:1] CHIP:DL: writing settings to file (/data/chip_counters.ini-s943fj)
matter-server | [1681757814.755587][1:1] CHIP:DL: renamed tmp file to file (/data/chip_counters.ini)
matter-server | [1681757814.755605][1:1] CHIP:DL: NVS set: chip-counters/reboot-count = 55 (0x37)
matter-server | [1681757814.756826][1:1] CHIP:DL: Got Ethernet interface: enp7s0
matter-server | [1681757814.757681][1:1] CHIP:DL: Found the primary Ethernet interface:enp7s0
matter-server | [1681757814.759630][1:1] CHIP:DL: Failed to get WiFi interface
matter-server | [1681757814.759642][1:1] CHIP:DL: Failed to reset WiFi statistic counts
matter-server | 2023-04-17 18:56:54 clavius PersistentStorage[1] WARNING Initializing persistent storage from file: /data/chip.json
matter-server | 2023-04-17 18:56:54 clavius PersistentStorage[1] WARNING Loading configuration from /data/chip.json...
matter-server | 2023-04-17 18:56:54 clavius chip.IN[1] DEBUG UDP::Init bind&listen port=0
matter-server | 2023-04-17 18:56:54 clavius chip.IN[1] DEBUG UDP::Init bound to port=34368
matter-server | 2023-04-17 18:56:54 clavius chip.IN[1] DEBUG UDP::Init bind&listen port=0
matter-server | 2023-04-17 18:56:54 clavius chip.IN[1] DEBUG UDP::Init bound to port=59256
matter-server | 2023-04-17 18:56:54 clavius chip.IN[1] DEBUG BLEBase::Init - setting/overriding transport
matter-server | 2023-04-17 18:56:54 clavius chip.IN[1] DEBUG TransportMgr initialized
matter-server | 2023-04-17 18:56:54 clavius chip.FP[1] DEBUG Initializing FabricTable from persistent storage
matter-server | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::GetKeyValue: Key = g/lkgt, Value = 0x7ffc2379dfe0 (18)
matter-server | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG Key Found 8
matter-server |
matter-server | 2023-04-17 18:56:54 clavius chip.TS[1] INFO Last Known Good Time: 2023-04-17T10:31:32
matter-server | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::GetKeyValue: Key = g/fidx, Value = 0x7ffc2379e1b0 (44)
matter-server | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG Key Not Found
matter-server |
matter-server | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::GetKeyValue: Key = g/fs/c, Value = 0x7ffc2379e050 (36)
matter-server | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG Key Not Found
matter-server |
matter-server | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::GetKeyValue: Key = g/gcc, Value = 0x7ffc2379e0ec (4)
matter-server | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG Key Found 4
matter-server |
matter-server | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::GetKeyValue: Key = g/gdc, Value = 0x7ffc2379e0ec (4)
matter-server | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG Key Found 4
matter-server |
matter-server | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::SetKeyValue: Key = g/gcc, Value = 0x7ffc2379e0ec (4)
matter-server | 2023-04-17 18:56:54 clavius PersistentStorage[1] INFO SetSdkKey: g/gcc = b'\xd8\xd6\x00\x00'
matter-server | 2023-04-17 18:56:54 clavius PersistentStorage[1] INFO Committing...
matter-server | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::SetKeyValue: Key = g/gdc, Value = 0x7ffc2379e0ec (4)
matter-server | 2023-04-17 18:56:54 clavius PersistentStorage[1] INFO SetSdkKey: g/gdc = b'\xd8\xd6\x00\x00'
matter-server | 2023-04-17 18:56:54 clavius PersistentStorage[1] INFO Committing...
matter-server | 2023-04-17 18:56:54 clavius chip.ZCL[1] INFO Using ZAP configuration...
matter-server | Traceback (most recent call last):
matter-server | File "/usr/local/bin/matter-server", line 8, in <module>
matter-server | sys.exit(main())
matter-server | File "/usr/local/lib/python3.10/site-packages/matter_server/server/__main__.py", line 79, in main
matter-server | server = MatterServer(
matter-server | File "/usr/local/lib/python3.10/site-packages/matter_server/server/server.py", line 74, in __init__
matter-server | self.stack = MatterStack(self)
matter-server | File "/usr/local/lib/python3.10/site-packages/matter_server/server/stack.py", line 32, in __init__
matter-server | self._chip_stack = ChipStack(
matter-server | File "/usr/local/lib/python3.10/site-packages/chip/ChipStack.py", line 68, in wrapper
matter-server | instance[0] = cls(*args, **kwargs)
matter-server | File "/usr/local/lib/python3.10/site-packages/chip/ChipStack.py", line 273, in __init__
matter-server | res.raise_on_error()
matter-server | File "/usr/local/lib/python3.10/site-packages/chip/native/__init__.py", line 66, in raise_on_error
matter-server | raise self.to_exception()
matter-server | chip.exceptions.ChipStackError: ../src/system/SystemLayerImplSelect.cpp:262: CHIP Error 0x000000C1: Endpoint pool full
docker-compose.yml:
matter-server:
build:
context: ./python-matter-server/
dockerfile: Dockerfile.dev
image: matter-server:latest
command: --log-level debug
container_name: matter-server
restart: unless-stopped
# Required for mDNS to work correctly
network_mode: host
# security_opt:
# # Needed for Bluetooth via dbus
# - apparmor:unconfined
volumes:
- ./matter-server-data:/data/
# - /run/dbus:/run/dbus:ro
The host has neither WiFi nor Bluetooth capability, only the correctly detected Ethernet interface alongside a lot of virtual ones.
Repo is cloned to ./python-matter-server where tag 3.3.0 is checked out.
I couldn’t find anything about the error in the library repo, so I came to ask about it here first. I’m not sure what kind of “Endpoint pool” it might be that’s full. Port 5580 is unused.
Running without docker, after manually creating /data and using it as storage path per some older issues on this tracker, results in the same crash.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 3
- Comments: 53 (11 by maintainers)
I’m here to comment that running 3.6.3 as recommended gives me the same error:
And keeps on restarting the Docker container.
I know this is very much in development, but asking people to run the matter server on a dedicated host because it can’t handle the presence of too many other things… kind of sounds like an implementation issue. Sorry if I sound rude.
If anyone is interested in my solution: https://gist.github.com/dasfuu/2f352aa2adbe0476633c642af72641a5
I setup the container using macvlan instead of
network_mode: hostFor later readers with the same problem: Setting up a working macvlan was more problematic than expected. I ran into the following two problems:
network_mode: hostcouldn’t connect to the matter server using a macvlan interface.The first issue is a peculiarity of macvlan devices, their host cannot communicate with them. This can be sidestepped with a small trick: Two macvlan devices in bridge mode can talk to each other, so we use a second macvlan device which is used as connection origin to the one we want to reach. To make this happen, I run the following at every boot:
This was based roughly on this article.
The second issue was a result of not accepted router advertisements at the macvlan device. A
docker-compose exec matter-server ip -6 routedidn’t show a route to the ipv6 network advertised by my AppleTV. They were missing, as I am no longer using my originaleth0where I accepted them. This can be again enabled with the right sysctls. My working docker-compose looks roughly like this:This is mostly based on @dasfuu 's link above.
Maybe it would be worthwhile to set
INET_CONFIG_NUM_*_ENDPOINTSto a higher value (256?) until this is properly fixed upstream. For reference, I have ~25 Docker networks resulting in ~70 network devices. Either way, I don’t think this is fixed and we should reopen this ticket.I’ve created an issue usptream: https://github.com/project-chip/connectedhomeip/issues/27007
From what I can tell, the relevant limits here are
INET_CONFIG_NUM_TCP_ENDPOINTSandINET_CONFIG_NUM_UDP_ENDPOINTS. I am guessing that the new SDK uses up more endpoints, which makes the limit lower. I am guessing that the old SDK fails too, just at a later point.@agners 20 running containers, 26 link entries (grep count like above)
I stopped and removed all containers, networks, and volumes, then rebooted the box.
With only the Home Assistant container running: 5 links (lo, eth, tailscale, docker, bridge)
matter-server 3.4.1 came up! Still says 5 links.
I tried stopping matter-server and bringing up containers one at a time, testing to see if matter-server would still come up afterwards. It was when I got to 23 links that matter-server failed. I tried stopping the last container and then matter-server would come up. I tried starting a different container to see if there was some weird issue and that didn’t make a difference.
If I leave enough stopped that matter-server can start, then start more, everything is fine. But then if I stop and try and restart it I get “Endpoint pool full” again.
Downgraded to 3.3.1 and it came right back up, 25 links (must have had a stale one before the reboot).
Something definitely changed with how the SDK is trying to get a socket.
3.3.0…3.4.0 diff since 3.4.0 crashes in the same way… looks like the only big changes were Python 4.5->4.6, the SDK, and the server running in a virtual environment (could that be part of it? I barely know anything about Python). I assume the other changes wouldn’t matter because the server isn’t even up yet?
I can’t fully say whether it works or not as I have only Thread based Matter services in my home which are primarily connected to Apple Home and added to HA in a secondary matter. However, when looking at the matter server bootup I do see discovery via mDNS so I think it probably works just fine:
@dasfuu That worked, thanks! Now to find some matter devices to test.
Tried with the docker image on 3.4.2 and it seems to make it past where the endpoint pool error was previously:
However it won’t fully come up (
matter_server.common.errors.VersionMismatch: CHIP Core version does not match CHIP Clusters version). I tried to guess and pip installed “home-assistant-chip-clusters==2023.6.dev104” but it didn’t like that either.But it does make it further, so seems to be on the right track.
Well, it is obviously a new issue, and we can’t stay with Matter 1.0 forever. Rather, we have to fix whatever changed in Matter 1.1. To fix we need to understand what is wrong/what makes it fail exactly.
I was able to reproduce it by starting many containers on the same host.
It seems that a socket watch pool gets exhausted: https://github.com/project-chip/connectedhomeip/blob/v1.1.0.1/src/system/SystemLayerImplSelect.h#L100. From what I can tell Matter used to have the same limit (64) in v1.0.0.2, so nto sure what is different with the new SDK.
I guess a temporary measure could be to just bump that number.
Ideally, we probably would somehow ignore the additional interfaces created by containers.
I’ll create an issue upstream and investigate in temporary bump the limit.
That is correct, it sounds like an issue with the SDK itself. Maybe its acquiring too many sockets or something. @agners what can we do to better debug this issue ? Can we reproduce it somehow on a setup with many running containers ? If we can somehow pinpoint (and at least reproduce it) we can check upstream.
One more datapoint maybe more helpful than the diff:
If I check out 3.3.1 and change nothing except requiring 2023.5.2 instead of 2023.4.1 for home-assistant-chip-clusters and home-assistant-chip-core in pyproject.toml, I also get the “Endpoint pool full” error.
I don’t think this bug actually has anything to do with code in this repo.
having the same issue, downgrading to v3.3.0 until a fix is provided. running on docker as others. have bt and wifi, both enabled and passed to the container
EDIT: 3.3.0 works fine, will stay on it until the issue is fixed. ran all versions without issues since 1.2.0, just replaced containers with no params or changed settings/configs. its not something on my setup, its an issue with 3.4.0 and 3.4.1.
logs:
Yeah, but you have the same issue so lets track progress here for everyone with the same issue EDIT: updated the above comment with the correct username