python-matter-server: `CHIP Error 0x000000C1: Endpoint pool full`

matter-server  | Starting server: matter-server --storage-path /data --log-level debug
matter-server  | 2023-04-17 18:56:54 clavius matter_server.server.stack[1] INFO Initializing CHIP/Matter Controller Stack...
matter-server  | 2023-04-17 18:56:54 clavius matter_server.server.stack[1] DEBUG Using storage file: /data/chip.json
matter-server  | [1681757814.754032][1:1] CHIP:CTL: Setting attestation nonce to random value
matter-server  | [1681757814.754132][1:1] CHIP:CTL: Setting CSR nonce to random value
matter-server  | [1681757814.754885][1:1] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_kvs
matter-server  | [1681757814.755063][1:1] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /data/chip_factory.ini
matter-server  | [1681757814.755144][1:1] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /data/chip_config.ini
matter-server  | [1681757814.755205][1:1] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /data/chip_counters.ini
matter-server  | [1681757814.755381][1:1] CHIP:DL: writing settings to file (/data/chip_counters.ini-s943fj)
matter-server  | [1681757814.755587][1:1] CHIP:DL: renamed tmp file to file (/data/chip_counters.ini)
matter-server  | [1681757814.755605][1:1] CHIP:DL: NVS set: chip-counters/reboot-count = 55 (0x37)
matter-server  | [1681757814.756826][1:1] CHIP:DL: Got Ethernet interface: enp7s0
matter-server  | [1681757814.757681][1:1] CHIP:DL: Found the primary Ethernet interface:enp7s0
matter-server  | [1681757814.759630][1:1] CHIP:DL: Failed to get WiFi interface
matter-server  | [1681757814.759642][1:1] CHIP:DL: Failed to reset WiFi statistic counts
matter-server  | 2023-04-17 18:56:54 clavius PersistentStorage[1] WARNING Initializing persistent storage from file: /data/chip.json
matter-server  | 2023-04-17 18:56:54 clavius PersistentStorage[1] WARNING Loading configuration from /data/chip.json...
matter-server  | 2023-04-17 18:56:54 clavius chip.IN[1] DEBUG UDP::Init bind&listen port=0
matter-server  | 2023-04-17 18:56:54 clavius chip.IN[1] DEBUG UDP::Init bound to port=34368
matter-server  | 2023-04-17 18:56:54 clavius chip.IN[1] DEBUG UDP::Init bind&listen port=0
matter-server  | 2023-04-17 18:56:54 clavius chip.IN[1] DEBUG UDP::Init bound to port=59256
matter-server  | 2023-04-17 18:56:54 clavius chip.IN[1] DEBUG BLEBase::Init - setting/overriding transport
matter-server  | 2023-04-17 18:56:54 clavius chip.IN[1] DEBUG TransportMgr initialized
matter-server  | 2023-04-17 18:56:54 clavius chip.FP[1] DEBUG Initializing FabricTable from persistent storage
matter-server  | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::GetKeyValue: Key = g/lkgt, Value = 0x7ffc2379dfe0 (18)
matter-server  | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG Key Found 8
matter-server  | 
matter-server  | 2023-04-17 18:56:54 clavius chip.TS[1] INFO Last Known Good Time: 2023-04-17T10:31:32
matter-server  | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::GetKeyValue: Key = g/fidx, Value = 0x7ffc2379e1b0 (44)
matter-server  | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG Key Not Found
matter-server  | 
matter-server  | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::GetKeyValue: Key = g/fs/c, Value = 0x7ffc2379e050 (36)
matter-server  | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG Key Not Found
matter-server  | 
matter-server  | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::GetKeyValue: Key = g/gcc, Value = 0x7ffc2379e0ec (4)
matter-server  | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG Key Found 4
matter-server  | 
matter-server  | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::GetKeyValue: Key = g/gdc, Value = 0x7ffc2379e0ec (4)
matter-server  | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG Key Found 4
matter-server  | 
matter-server  | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::SetKeyValue: Key = g/gcc, Value = 0x7ffc2379e0ec (4)
matter-server  | 2023-04-17 18:56:54 clavius PersistentStorage[1] INFO SetSdkKey: g/gcc = b'\xd8\xd6\x00\x00'
matter-server  | 2023-04-17 18:56:54 clavius PersistentStorage[1] INFO Committing...
matter-server  | 2023-04-17 18:56:54 clavius chip.CTL[1] DEBUG StorageAdapter::SetKeyValue: Key = g/gdc, Value = 0x7ffc2379e0ec (4)
matter-server  | 2023-04-17 18:56:54 clavius PersistentStorage[1] INFO SetSdkKey: g/gdc = b'\xd8\xd6\x00\x00'
matter-server  | 2023-04-17 18:56:54 clavius PersistentStorage[1] INFO Committing...
matter-server  | 2023-04-17 18:56:54 clavius chip.ZCL[1] INFO Using ZAP configuration...
matter-server  | Traceback (most recent call last):
matter-server  |   File "/usr/local/bin/matter-server", line 8, in <module>
matter-server  |     sys.exit(main())
matter-server  |   File "/usr/local/lib/python3.10/site-packages/matter_server/server/__main__.py", line 79, in main
matter-server  |     server = MatterServer(
matter-server  |   File "/usr/local/lib/python3.10/site-packages/matter_server/server/server.py", line 74, in __init__
matter-server  |     self.stack = MatterStack(self)
matter-server  |   File "/usr/local/lib/python3.10/site-packages/matter_server/server/stack.py", line 32, in __init__
matter-server  |     self._chip_stack = ChipStack(
matter-server  |   File "/usr/local/lib/python3.10/site-packages/chip/ChipStack.py", line 68, in wrapper
matter-server  |     instance[0] = cls(*args, **kwargs)
matter-server  |   File "/usr/local/lib/python3.10/site-packages/chip/ChipStack.py", line 273, in __init__
matter-server  |     res.raise_on_error()
matter-server  |   File "/usr/local/lib/python3.10/site-packages/chip/native/__init__.py", line 66, in raise_on_error
matter-server  |     raise self.to_exception()
matter-server  | chip.exceptions.ChipStackError: ../src/system/SystemLayerImplSelect.cpp:262: CHIP Error 0x000000C1: Endpoint pool full

docker-compose.yml:

  matter-server:
    build:
      context: ./python-matter-server/
      dockerfile: Dockerfile.dev
    image: matter-server:latest
    command: --log-level debug
    container_name: matter-server
    restart: unless-stopped
    # Required for mDNS to work correctly
    network_mode: host
#    security_opt:
#      # Needed for Bluetooth via dbus
#      - apparmor:unconfined
    volumes:
      - ./matter-server-data:/data/
#      - /run/dbus:/run/dbus:ro

The host has neither WiFi nor Bluetooth capability, only the correctly detected Ethernet interface alongside a lot of virtual ones.

Repo is cloned to ./python-matter-server where tag 3.3.0 is checked out.

I couldn’t find anything about the error in the library repo, so I came to ask about it here first. I’m not sure what kind of “Endpoint pool” it might be that’s full. Port 5580 is unused.

Running without docker, after manually creating /data and using it as storage path per some older issues on this tracker, results in the same crash.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 3
  • Comments: 53 (11 by maintainers)

Most upvoted comments

I’m here to comment that running 3.6.3 as recommended gives me the same error:

chip.exceptions.ChipStackError: src/system/SystemLayerImplSelect.cpp:268: CHIP Error 0x000000C1: Endpoint pool full

And keeps on restarting the Docker container.

So far this issue (endpoint full) has been reported by people either running a crazy amount of containers on their host or some network configuration (e.g. virtualization with fault NIC driver and.or promiscuous mode).

I know this is very much in development, but asking people to run the matter server on a dedicated host because it can’t handle the presence of too many other things… kind of sounds like an implementation issue. Sorry if I sound rude.

If anyone is interested in my solution: https://gist.github.com/dasfuu/2f352aa2adbe0476633c642af72641a5

I setup the container using macvlan instead of network_mode: host

For later readers with the same problem: Setting up a working macvlan was more problematic than expected. I ran into the following two problems:

  1. Home assistant running as network_mode: host couldn’t connect to the matter server using a macvlan interface.
  2. Setting up a matter device failed without a meaningful error message.

The first issue is a peculiarity of macvlan devices, their host cannot communicate with them. This can be sidestepped with a small trick: Two macvlan devices in bridge mode can talk to each other, so we use a second macvlan device which is used as connection origin to the one we want to reach. To make this happen, I run the following at every boot:

# In my case, I use 192.168.0.0/16 in my network. I create 192.168.2.99 as origin to reach the
# matter server's macvlan under 192.168.2.3/32 .
ip link add macvlan-shim link eth0 type macvlan mode bridge
ip addr add 192.168.2.99/32 dev macvlan-shim
ip link set macvlan-shim up

# Use the macvlan-shim to reach 192.168.2.3
ip route add 192.168.2.3/32 dev macvlan-shim

This was based roughly on this article.

The second issue was a result of not accepted router advertisements at the macvlan device. A docker-compose exec matter-server ip -6 route didn’t show a route to the ipv6 network advertised by my AppleTV. They were missing, as I am no longer using my original eth0 where I accepted them. This can be again enabled with the right sysctls. My working docker-compose looks roughly like this:

version: '3'
services:
  homeassistant:
    container_name: homeassistant
    image: "ghcr.io/home-assistant/home-assistant:stable"
    logging:
      driver: journald
    volumes:
      - ./config:/config
      - /etc/localtime:/etc/localtime:ro
    restart: unless-stopped
    network_mode: host
    user: 1009:1012

 matter-server:
    image: ghcr.io/home-assistant-libs/python-matter-server:4
    container_name: matter-server
    restart: unless-stopped
    security_opt:
      # Needed for Bluetooth via dbus
      - apparmor:unconfined
    volumes:
      - ./matter-data/:/data/
      - /run/dbus:/run/dbus:ro
    dns:
      - "192.168.0.2"
    networks:
      dockervlan:
        ipv4_address: 192.168.2.3
        ipv6_address: "fdf9:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX"
    sysctls:
      - net.ipv6.conf.eth0.accept_ra=2 # note that the eth0 here is NOT the same eth0 outside the Docker namespace.
      - net.ipv6.conf.eth0.accept_ra_rt_info_max_plen=64

networks:
  dockervlan:
    driver: macvlan
    enable_ipv6: true
    driver_opts:
      parent: eth0
    ipam:
      config:
        - subnet: "192.168.0.0/16"
          gateway: "192.168.0.1"
        - subnet: "fdf9:XXXX:XXXX:XXXX::/64" # my ipv6 subnet, NOT the one of the AppleTV, just the ULA addresses I use.

This is mostly based on @dasfuu 's link above.

Maybe it would be worthwhile to set INET_CONFIG_NUM_*_ENDPOINTS to a higher value (256?) until this is properly fixed upstream. For reference, I have ~25 Docker networks resulting in ~70 network devices. Either way, I don’t think this is fixed and we should reopen this ticket.

I’ve created an issue usptream: https://github.com/project-chip/connectedhomeip/issues/27007

From what I can tell, the relevant limits here are INET_CONFIG_NUM_TCP_ENDPOINTS and INET_CONFIG_NUM_UDP_ENDPOINTS. I am guessing that the new SDK uses up more endpoints, which makes the limit lower. I am guessing that the old SDK fails too, just at a later point.

@agners 20 running containers, 26 link entries (grep count like above)

I stopped and removed all containers, networks, and volumes, then rebooted the box.

With only the Home Assistant container running: 5 links (lo, eth, tailscale, docker, bridge)

matter-server 3.4.1 came up! Still says 5 links.

I tried stopping matter-server and bringing up containers one at a time, testing to see if matter-server would still come up afterwards. It was when I got to 23 links that matter-server failed. I tried stopping the last container and then matter-server would come up. I tried starting a different container to see if there was some weird issue and that didn’t make a difference.

If I leave enough stopped that matter-server can start, then start more, everything is fine. But then if I stop and try and restart it I get “Endpoint pool full” again.

Downgraded to 3.3.1 and it came right back up, 25 links (must have had a stale one before the reboot).

Something definitely changed with how the SDK is trying to get a socket.

3.3.0…3.4.0 diff since 3.4.0 crashes in the same way… looks like the only big changes were Python 4.5->4.6, the SDK, and the server running in a virtual environment (could that be part of it? I barely know anything about Python). I assume the other changes wouldn’t matter because the server isn’t even up yet?

I can’t fully say whether it works or not as I have only Thread based Matter services in my home which are primarily connected to Apple Home and added to HA in a secondary matter. However, when looking at the matter server bootup I do see discovery via mDNS so I think it probably works just fine:

matter  | 2024-02-12 13:45:29 2475efcf2cc2 matter_server.server.device_controller[1] INFO Node 7 discovered on MDNS
matter  | 2024-02-12 13:45:29 2475efcf2cc2 matter_server.server.device_controller.[node 7][1] INFO Setting up attributes and events subscription.
matter  | 2024-02-12 13:45:29 2475efcf2cc2 matter_server.server.vendor_info[1] INFO Fetched 158 vendors from DCL.
matter  | 2024-02-12 13:45:29 2475efcf2cc2 matter_server.server.vendor_info[1] INFO Saving vendor info to storage.
matter  | 2024-02-12 13:45:33 2475efcf2cc2 matter_server.server.device_controller.[node 7][1] INFO Subscription succeeded

@dasfuu That worked, thanks! Now to find some matter devices to test.

Tried with the docker image on 3.4.2 and it seems to make it past where the endpoint pool error was previously:

matter-server  | 2023-06-02 04:53:35 citywall PersistentStorage[1] INFO Committing...
matter-server  | 2023-06-02 04:53:35 citywall chip.ZCL[1] INFO Using ZAP configuration...
matter-server  | 2023-06-02 04:53:35 citywall chip.DL[1] ERROR MDNS failed to join multicast group on vethebeb55f for address type IPv4: src/inet/UDPEndPointImplSockets.cpp:764: Inet Error 0x00000110: Address not found
matter-server  | 2023-06-02 04:53:35 citywall chip.DL[1] ERROR MDNS failed to join multicast group on vethbc94dbe for address type IPv4: src/inet/UDPEndPointImplSockets.cpp:764: Inet Error 0x00000110: Address not found
matter-server  | 2023-06-02 04:53:35 citywall chip.DL[1] ERROR MDNS failed to join multicast group on vethbcea58b for address type IPv4: src/inet/UDPEndPointImplSockets.cpp:764: Inet Error 0x00000110: Address not found

However it won’t fully come up (matter_server.common.errors.VersionMismatch: CHIP Core version does not match CHIP Clusters version). I tried to guess and pip installed “home-assistant-chip-clusters==2023.6.dev104” but it didn’t like that either.

But it does make it further, so seems to be on the right track.

why does this matter as older versions work without issues?

Well, it is obviously a new issue, and we can’t stay with Matter 1.0 forever. Rather, we have to fix whatever changed in Matter 1.1. To fix we need to understand what is wrong/what makes it fail exactly.

Can we reproduce it somehow on a setup with many running containers ?

I was able to reproduce it by starting many containers on the same host.

It seems that a socket watch pool gets exhausted: https://github.com/project-chip/connectedhomeip/blob/v1.1.0.1/src/system/SystemLayerImplSelect.h#L100. From what I can tell Matter used to have the same limit (64) in v1.0.0.2, so nto sure what is different with the new SDK.

I guess a temporary measure could be to just bump that number.

Ideally, we probably would somehow ignore the additional interfaces created by containers.

I’ll create an issue upstream and investigate in temporary bump the limit.

That is correct, it sounds like an issue with the SDK itself. Maybe its acquiring too many sockets or something. @agners what can we do to better debug this issue ? Can we reproduce it somehow on a setup with many running containers ? If we can somehow pinpoint (and at least reproduce it) we can check upstream.

One more datapoint maybe more helpful than the diff:

If I check out 3.3.1 and change nothing except requiring 2023.5.2 instead of 2023.4.1 for home-assistant-chip-clusters and home-assistant-chip-core in pyproject.toml, I also get the “Endpoint pool full” error.

I don’t think this bug actually has anything to do with code in this repo.

having the same issue, downgrading to v3.3.0 until a fix is provided. running on docker as others. have bt and wifi, both enabled and passed to the container

EDIT: 3.3.0 works fine, will stay on it until the issue is fixed. ran all versions without issues since 1.2.0, just replaced containers with no params or changed settings/configs. its not something on my setup, its an issue with 3.4.0 and 3.4.1.

logs:

[1685136051.235045][1:1] CHIP:DL: Found the primary Ethernet interface:enp88s0
[1685136051.235664][1:1] CHIP:DL: Got WiFi interface: wlo1
[1685136051.235674][1:1] CHIP:DL: Failed to reset WiFi statistic counts
2023-05-27 00:20:51 RHomeServer PersistentStorage[1] WARNING Initializing persistent storage from file: /data/chip.json
2023-05-27 00:20:51 RHomeServer PersistentStorage[1] WARNING Loading configuration from /data/chip.json...
2023-05-27 00:20:51 RHomeServer chip.TS[1] INFO Last Known Good Time: 2023-05-23T12:00:32
2023-05-27 00:20:51 RHomeServer chip.FP[1] INFO Fabric index 0x1 was retrieved from storage. Compressed FabricId 0xD28850A28625BD87, FabricId 0x0000000000000001, NodeId 0x000000000001B669, VendorId 0xFFF1
2023-05-27 00:20:51 RHomeServer PersistentStorage[1] INFO SetSdkKey: g/gcc = b'\xe8\x0c=\x00'
2023-05-27 00:20:51 RHomeServer PersistentStorage[1] INFO Committing...
2023-05-27 00:20:51 RHomeServer PersistentStorage[1] INFO SetSdkKey: g/gdc = b'\xe8\x0c=\x00'
2023-05-27 00:20:51 RHomeServer PersistentStorage[1] INFO Committing...
2023-05-27 00:20:51 RHomeServer chip.ZCL[1] INFO Using ZAP configuration...
Traceback (most recent call last):
  File "/usr/local/bin/matter-server", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/matter_server/server/__main__.py", line 79, in main
    server = MatterServer(
  File "/usr/local/lib/python3.10/site-packages/matter_server/server/server.py", line 74, in __init__
    self.stack = MatterStack(self)
  File "/usr/local/lib/python3.10/site-packages/matter_server/server/stack.py", line 32, in __init__
    self._chip_stack = ChipStack(
  File "/usr/local/lib/python3.10/site-packages/chip/ChipStack.py", line 64, in wrapper
    instance[0] = cls(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/chip/ChipStack.py", line 270, in __init__
    res.raise_on_error()
  File "/usr/local/lib/python3.10/site-packages/chip/native/__init__.py", line 67, in raise_on_error
    raise self.to_exception()
chip.exceptions.ChipStackError: src/system/SystemLayerImplSelect.cpp:268: CHIP Error 0x000000C1: Endpoint pool full

@marcelveldt thank you! Just FYI, this wasn’t my issue 😉 It was someone elses

Yeah, but you have the same issue so lets track progress here for everyone with the same issue EDIT: updated the above comment with the correct username