aiomqtt: High CPU usage -

Hi,

trying to get some help figuring out a problem im having, and have landed here. Running a script (called ecowitt2mqtt) on a RPi 4 Bullseye, that dumps data to my mqtt broker (data it obtains from my local weather station), and then HA discoveres it. The script host (RPi 4), the broker (Ubuntu 22.04), and HA are 3 different instances on the same network. After hours or days, one of the cores on my RPi 4 that is running the script chokes, and runs at 100% (or ~30% CPU total).

Ran a pyspy on the instance, and caught 700 errors, but nothing too conclusive. Running:

$ strace -p <pid> -f -s 4096

on the stuck process, yields this:

[pid 121259] recvfrom(14, 0x7f8683aaa0, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 121259] epoll_pwait(3, [{EPOLLIN, {u32=13, u64=13}}], 1024, 76, NULL, 8) = 1
[pid 121259] recvfrom(14, 0x7f8683a1a0, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 121259] epoll_pwait(3, [{EPOLLIN, {u32=13, u64=13}}], 1024, 76, NULL, 8) = 1
[pid 121259] recvfrom(14, 0x7f8683aaa0, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 121259] epoll_pwait(3, [{EPOLLIN, {u32=13, u64=13}}], 1024, 76, NULL, 8) = 1
[pid 121259] recvfrom(14, 0x7f8683a1a0, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 121259] epoll_pwait(3, [{EPOLLIN, {u32=13, u64=13}}], 1024, 76, NULL, 8) = 1
[pid 121259] recvfrom(14, 0x7f8683aaa0, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 121259] epoll_pwait(3, [{EPOLLIN, {u32=13, u64=13}}], 1024, 75, NULL, 8) = 1
[pid 121259] recvfrom(14, 0x7f8683a1a0, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 121259] epoll_pwait(3, [{EPOLLIN, {u32=13, u64=13}}], 1024, 75, NULL, 8) = 1
[pid 121259] recvfrom(14, 0x7f8683aaa0, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 121259] epoll_pwait(3, [{EPOLLIN, {u32=13, u64=13}}], 1024, 75, NULL, 8) = 1
[pid 121259] recvfrom(14, 0x7f8683a1a0, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 121259] epoll_pwait(3, [{EPOLLIN, {u32=13, u64=13}}], 1024, 74, NULL, 8) = 1
[pid 121259] recvfrom(14, 0x7f8683aaa0, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 121259] epoll_pwait(3, [{EPOLLIN, {u32=13, u64=13}}], 1024, 74, NULL, 8) = 1
[pid 121259] recvfrom(14, 0x7f8683a1a0, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 121259] epoll_pwait(3, [{EPOLLIN, {u32=13, u64=13}}], 1024, 74, NULL, 8) = 1
[pid 121259] recvfrom(14, 0x7f8683aaa0, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 121259] epoll_pwait(3, [{EPOLLIN, {u32=13, u64=13}}], 1024, 74, NULL, 8) = 1
[pid 121259] recvfrom(14, 0x7f8683a1a0, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 121259] epoll_pwait(3, [{EPOLLIN, {u32=13, u64=13}}], 1024, 73, NULL, 8) = 1
[pid 121259] recvfrom(14, 0x7f8683aaa0, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 121259] epoll_pwait(3, [{EPOLLIN, {u32=13, u64=13}}], 1024, 73, NULL, 8) = 1
[pid 121259] recvfrom(14, ^Cstrace: Process 121259 detached

and then

$ lsof -p <pid> -n

yields these descriptors:

COMMAND      PID USER   FD      TYPE             DEVICE SIZE/OFF   NODE NAME
ecowitt2m 121259 root  cwd       DIR              179,2     4096      2 /
ecowitt2m 121259 root  rtd       DIR              179,2     4096      2 /
ecowitt2m 121259 root  txt       REG              179,2  5280744   1882 /usr/bin/python3.9
ecowitt2m 121259 root  mem       REG              179,2    15688  12246 /usr/lib/python3.9/lib-dynload/_multiprocessing.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2   192112   5267 /usr/lib/aarch64-linux-gnu/libmpdec.so.2.5.1
ecowitt2m 121259 root  mem       REG              179,2   163840  12240 /usr/lib/python3.9/lib-dynload/_decimal.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2    63376  12241 /usr/lib/python3.9/lib-dynload/_hashlib.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2    44568  12242 /usr/lib/python3.9/lib-dynload/_json.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2   350640 650642 /usr/local/lib/python3.9/dist-packages/Levenshtein/_levenshtein.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2  2127000   7468 /usr/local/lib/python3.9/dist-packages/_ruamel_yaml.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2    15304  12249 /usr/lib/python3.9/lib-dynload/_queue.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2    31592   2153 /usr/lib/aarch64-linux-gnu/librt-2.31.so
ecowitt2m 121259 root  mem       REG              179,2 11791880 650852 /usr/local/lib/python3.9/dist-packages/uvloop/loop.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2    30712   5387 /usr/lib/aarch64-linux-gnu/libuuid.so.1.3.0
ecowitt2m 121259 root  mem       REG              179,2     6240  12257 /usr/lib/python3.9/lib-dynload/_uuid.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2   154232   2100 /usr/lib/aarch64-linux-gnu/liblzma.so.5.2.5
ecowitt2m 121259 root  mem       REG              179,2    33144  12244 /usr/lib/python3.9/lib-dynload/_lzma.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2    70504   5115 /usr/lib/aarch64-linux-gnu/libbz2.so.1.0.4
ecowitt2m 121259 root  mem       REG              179,2    20032  12226 /usr/lib/python3.9/lib-dynload/_bz2.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2    62336  12225 /usr/lib/python3.9/lib-dynload/_asyncio.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2  2739952   8506 /usr/lib/aarch64-linux-gnu/libcrypto.so.1.1
ecowitt2m 121259 root  mem       REG              179,2   577176   8510 /usr/lib/aarch64-linux-gnu/libssl.so.1.1
ecowitt2m 121259 root  mem       REG              179,2   181184  12251 /usr/lib/python3.9/lib-dynload/_ssl.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2    51640   2147 /usr/lib/aarch64-linux-gnu/libnss_files-2.31.so
ecowitt2m 121259 root  mem       REG              179,2     6080  12233 /usr/lib/python3.9/lib-dynload/_contextvars.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2    10320  12247 /usr/lib/python3.9/lib-dynload/_opcode.cpython-39-aarch64-linux-gnu.so
ecowitt2m 121259 root  mem       REG              179,2  3041504   2580 /usr/lib/locale/locale-archive
ecowitt2m 121259 root  mem       REG              179,2  1458480   2140 /usr/lib/aarch64-linux-gnu/libc-2.31.so
ecowitt2m 121259 root  mem       REG              179,2   104824   5407 /usr/lib/aarch64-linux-gnu/libz.so.1.2.11
ecowitt2m 121259 root  mem       REG              179,2   161856   5161 /usr/lib/aarch64-linux-gnu/libexpat.so.1.6.12
ecowitt2m 121259 root  mem       REG              179,2   633000   2142 /usr/lib/aarch64-linux-gnu/libm-2.31.so
ecowitt2m 121259 root  mem       REG              179,2    14672   2155 /usr/lib/aarch64-linux-gnu/libutil-2.31.so
ecowitt2m 121259 root  mem       REG              179,2    14560   2141 /usr/lib/aarch64-linux-gnu/libdl-2.31.so
ecowitt2m 121259 root  mem       REG              179,2   160200   2151 /usr/lib/aarch64-linux-gnu/libpthread-2.31.so
ecowitt2m 121259 root  mem       REG              179,2   145352   2136 /usr/lib/aarch64-linux-gnu/ld-2.31.so
ecowitt2m 121259 root  mem       REG              179,2    27004   2448 /usr/lib/aarch64-linux-gnu/gconv/gconv-modules.cache
ecowitt2m 121259 root    0r      CHR                1,3      0t0      5 /dev/null
ecowitt2m 121259 root    1u     unix 0x00000000c8dfa60c      0t0 625905 type=STREAM
ecowitt2m 121259 root    2u     unix 0x00000000c8dfa60c      0t0 625905 type=STREAM
ecowitt2m 121259 root    3u  a_inode               0,13        0   7590 [eventpoll]
ecowitt2m 121259 root    4r     FIFO               0,12      0t0 625101 pipe
ecowitt2m 121259 root    5w     FIFO               0,12      0t0 625101 pipe
ecowitt2m 121259 root    6r     FIFO               0,12      0t0 625102 pipe
ecowitt2m 121259 root    7w     FIFO               0,12      0t0 625102 pipe
ecowitt2m 121259 root    8u  a_inode               0,13        0   7590 [eventfd]
ecowitt2m 121259 root    9u     unix 0x000000003d8f318e      0t0 625103 type=STREAM
ecowitt2m 121259 root   10u     unix 0x000000007373546a      0t0 625104 type=STREAM
ecowitt2m 121259 root   11u     IPv4             625114      0t0    TCP *:http-alt (LISTEN)
ecowitt2m 121259 root   12r      CHR                1,3      0t0      5 /dev/null
ecowitt2m 121259 root   13u     IPv4             630945      0t0    TCP 192.168.1.130:http-alt->192.168.1.138:55339 (CLOSE_WAIT)
ecowitt2m 121259 root   14u     IPv4             633219      0t0    TCP 192.168.1.130:55201->192.168.1.139:1883 (ESTABLISHED)

FYI 192.168.1.130:55201 (the RPi running the script 192.168.1.139:1883 (my mqtt broker)

it has been suggested by someone much more knowledgeable than me that the issue could be here:

https://github.com/sbtinstruments/asyncio-mqtt/blob/6b02071227635fa532698b55c5159755f4e411b2/asyncio_mqtt/client.py#L524

I am running the latest version asyncio-mqtt on the RPi.

anybody know why this resources becomes unavailable and chokes my RPi?

thanks!

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (13 by maintainers)

Most upvoted comments

FYI, digging in and keeping a single connection open won’t work with ecowitt2mqtt’s architecture – since we’re a Uvicorn + FastAPI application, there’s no feasible way to have a reconnect “loop” (similar to the docs) because we publish when we received REST API calls via FastAPI. Happy to go into more detail if interested, but more importantly, we’ll need to connect/disconnect with each payload. If that’s always going to spike CPU, I’m not certain we can do anything…

EDIT: I lied. 😂 https://github.com/bachya/ecowitt2mqtt/pull/236

Sorry about the silence—I was on a short vacation.

Glad that you figured it out. Feel free to open new issues/discussions/PRs if you find something in anyio-mqtt that you would like to add/change/fix. 👍

Ah, okay – I didn’t know that. I would expect that when the context manager ends, it closes everything nicely so the same object can be used again… Does something happen during Client init that can’t be re-done?

In general, context managers are single-use unless otherwise specified. As for asyncio_mqtt.Client, I simply don’t know if the current implementation is already reusable (in contrast to single-use). I doubt it (this issue itself indicates that the client is not reusable), but we need to test it to really find out. See #48 (and my comment). I’ll be glad to review a PR on the matter. 👍

For what it’s worth, it does “work” in that most users successfully publish multiple messages with the same client (re-entered). Whether that’s correct practice (or causing issues under the surface) is obviously a different matter.

(2) is ultimately irrelevant, but what about (1)? Do I need to implement my own reconnection logic?

Yes, for now you would have to. My suggestion is to use a retry loop similar to that found in the advanced example in the readme file. For this to work, you must ensure that exceptions (e.g., due to network errors) propagate up to the retry loop. E.g., via an anyio.TaskGroup.

Yep, got it.

One additional question re: ^^^. I noticed your advanced example uses an AsyncExitStack – task grouping for cancellation is the primary concern here, is anyio really needed, or can I stick with the built-in primitives?

You can stick with the built-in primitives. 👍 That’s what I do in asyncio-mqtt. That being said, if I had known about anyio (or structured concurrency in general) back when I created this library, then I would have used anyio. No doubt about that. anyio’s task groups make everything much easier to reason about.

I stick to raw asyncio for now to maintain backwards compatibility (and avoid too many dependencies). There was a discussion about this in the past: #44.

Got it. I’m looking for less work, so I’ll check out anyio. 😂 Thanks for the recommendation!