netopeer2: Netopeer2-server crash after user disconnection

Versions: Netopeer2-server: netopeer2-2.1.71 sysrepo: sysrepo-2.2.105 libnetconf2: libnetconf2-2.1.37 libyang: libyang-2.1.111

Hi,

I have a sporadic issue (pretty rare) where the disconnection of a user seems to be the initial trigger for a crash of the netopeer2-server.

In the trace below, after the connection is broken/stopped non-gracefully (scenario where the user/client crashes and the expectation is that the server is able to recover and continue without issues), there is this trace that starts to appear: Waiting on a conditional variable failed (sr_notif_sub_del: Connection timed out).

Despite that, netopeer2-server is able to broadcast the notification about the session being terminated, but then it crashes with segfault. (unfortunately I don’t have the coredump, I will try to reproduce and collect the backtrace if possible)

Following that crash, there is this log from orusw (our app with callback on user disconnection etc through sysrepo) Waiting on a conditional variable failed (sr_shmext_notif_sub_stop: Connection timed out). followed by an assert which crashes the application as well orusw: /workspace/Services/3rdParty/sysrepo-2.2.105/src/common.c:2839: sr_rwrelock: Assertion !rwlock->readers[0] && !rwlock->upgr && (rwlock->writer == cid)’ failed.` I guess this is a consequence of netopeer2-server crashing and corrupting sysrepo, but I wanted to include it as well if it helps finding the root cause of the problem.

Log:

2023-10-10 06:04:21,180 DEBG ‘orusw’ stdout output: 2023-10-10 06:04:21/INF:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, EV LISTEN: “/o-ran-supervision:supervision-watchdog-reset” “rpc” ID 151 priority 0 success (remaining 0 subscribers). 2023-10-10 06:04:21/INF:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, Session 3503 (user “root”, CID 28) created.

2023-10-10 06:04:21,181 DEBG ‘netopeer2-server’ stdout output: [INF]: SR: EV ORIGIN: “/o-ran-supervision:supervision-watchdog-reset” “rpc” ID 151 priority 0 succeeded.

2023-10-10 06:04:21,181 DEBG ‘orusw’ stdout output: 2023-10-10 06:04:21/INF:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/o-ran-supervision.c:109 Number of subscribers to supervision is 1

2023-10-10 06:04:21,182 DEBG ‘netopeer2-server’ stdout output: [INF]: NP: Session 1: thread 2 event new RPC.

2023-10-10 06:04:47,358 DEBG ‘netopeer2-server’ stdout output: [2023/10/10 06:04:47.358228, 1] ssh_socket_exception_callback: Socket exception callback: 1 (0) [2023/10/10 06:04:47.358337, 1] ssh_socket_exception_callback: Socket error: disconnected

2023-10-10 06:04:47,359 DEBG ‘netopeer2-server’ stdout output: [ERR]: LN: Session 1: SSH channel poll error (Socket error: disconnected). [INF]: NP: Session 1: thread 1 event session terminated.

2023-10-10 06:04:47,648 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Call Home client “NC-Client-1” session terminated. [INF]: LN: Trying to connect via IPv4 to 192.168.216.2:4334.

2023-10-10 06:04:47,648 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: getsockopt() error (Connection refused).

2023-10-10 06:04:49,648 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Trying to connect via IPv4 to 192.168.216.2:4334.

2023-10-10 06:04:49,649 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: getsockopt() error (Connection refused).

2023-10-10 06:04:51,649 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Call Home client “NC-Client-1” endpoint “default-ssh” failed connection attempt limit 3 reached. [INF]: LN: Call Home client “NC-Client-1” endpoint “default-ssh” connecting… [INF]: LN: Trying to connect via IPv4 to 192.168.216.2:4334.

2023-10-10 06:04:51,649 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: getsockopt() error (Connection refused).

2023-10-10 06:04:53,650 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Trying to connect via IPv4 to 192.168.216.2:4334.

2023-10-10 06:04:53,651 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: getsockopt() error (Connection refused).

2023-10-10 06:04:55,651 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Trying to connect via IPv4 to 192.168.216.2:4334.

2023-10-10 06:04:56,152 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Timed out after 500 ms (Operation now in progress).

2023-10-10 06:04:58,152 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Call Home client “NC-Client-1” endpoint “default-ssh” failed connection attempt limit 3 reached. [INF]: LN: Call Home client “NC-Client-1” endpoint “default-ssh” connecting… [INF]: LN: Trying to connect via IPv4 to 192.168.216.2:4334.

2023-10-10 06:04:58,653 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Timed out after 500 ms (Operation now in progress).

2023-10-10 06:05:00,655 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Trying to connect the pending socket 49.

2023-10-10 06:05:01,156 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Timed out after 500 ms (Operation now in progress).

2023-10-10 06:05:02,374 DEBG ‘netopeer2-server’ stdout output: [ERR]: SR: Waiting on a conditional variable failed (sr_notif_sub_del: Connection timed out).

2023-10-10 06:05:03,156 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Trying to connect the pending socket 49.

2023-10-10 06:05:03,656 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Timed out after 500 ms (Operation now in progress).

2023-10-10 06:05:05,657 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Call Home client “NC-Client-1” endpoint “default-ssh” failed connection attempt limit 3 reached. [INF]: LN: Call Home client “NC-Client-1” endpoint “default-ssh” connecting…

2023-10-10 06:05:05,657 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Trying to connect via IPv4 to 192.168.216.2:4334.

2023-10-10 06:05:06,157 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Timed out after 500 ms (Operation now in progress).

2023-10-10 06:05:08,158 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Trying to connect the pending socket 49.

2023-10-10 06:05:08,658 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Timed out after 500 ms (Operation now in progress).

2023-10-10 06:05:10,658 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Trying to connect the pending socket 49.

2023-10-10 06:05:11,159 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Timed out after 500 ms (Operation now in progress).

2023-10-10 06:05:13,159 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Call Home client “NC-Client-1” endpoint “default-ssh” failed connection attempt limit 3 reached. [INF]: LN: Call Home client “NC-Client-1” endpoint “default-ssh” connecting… [INF]: LN: Trying to connect via IPv4 to 192.168.216.2:4334.

2023-10-10 06:05:13,660 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Timed out after 500 ms (Operation now in progress).

2023-10-10 06:05:15,660 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Trying to connect the pending socket 49.

2023-10-10 06:05:16,160 DEBG ‘netopeer2-server’ stdout output: [INF]: LN: Timed out after 500 ms (Operation now in progress).

2023-10-10 06:05:17,375 DEBG ‘netopeer2-server’ stdout output: [ERR]: SR: Waiting on a conditional variable failed (sr_notif_sub_del: Connection timed out). [INF]: SR: EV ORIGIN: “ietf-netconf-notifications” “notif” ID 41 priority 0 for 5 subscribers published.

2023-10-10 06:05:17,375 DEBG ‘netopeer2-server’ stdout output: [INF]: NP: Generated new event (netconf-session-end). [INF]: SR: EV LISTEN: “ietf-netconf-notifications” “notif” ID 41 processing.

2023-10-10 06:05:17,376 DEBG ‘orusw’ stdout output: 2023-10-10 06:05:17/INF:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, EV LISTEN: “ietf-netconf-notifications” “notif” ID 41 processing.

2023-10-10 06:05:17,376 DEBG ‘orusw’ stdout output: 2023-10-10 06:05:17/INF:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/ietf-netconf-notifications.c:35 NETCONF Client 192.168.216.2 disconnected 2023-10-10 06:05:17/INF:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/ietf-netconf-notifications.c:39 NETCONF Session Termination Reason is other 2023-10-10 06:05:17/INF:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, EV LISTEN: “ietf-netconf-notifications” “notif” ID 41 priority 0 success (remaining 2 subscribers).

2023-10-10 06:05:17,376 DEBG ‘UserManagement’ stderr output: 2023-10-10 06:05:17.376 INFO/UE_APP-OAM/UserManagement: UserManagement.py:208 - callbackUserDisconnection() - Received notification {‘username’: ‘docomo999’, ‘session-id’: 1, ‘source-host’: ‘192.168.216.2’, ‘termination-reason’: ‘other’} at 1696917917 (type realtime).

2023-10-10 06:05:17,380 DEBG ‘UserManagement’ stderr output: 2023-10-10 06:05:17.379 DEBUG/UE_APP-OAM/UserManagement: UserManagement.py:224 - callbackUserDisconnection() - Disconnected user is not sudo but None, nothing to do

2023-10-10 06:05:17,397 DEBG fd 90 closed, stopped monitoring <POutputDispatcher at 140189124471584 for <Subprocess at 140189126124880 with name netopeer2-server in state RUNNING> (stdout)> 2023-10-10 06:05:18,399 INFO exited: netopeer2-server (terminated by SIGSEGV (core dumped); not expected) 2023-10-10 06:05:18,399 DEBG received SIGCHLD indicating a child quit 2023-10-10 06:05:22,182 DEBG ‘orusw’ stdout output: 2023-10-10 06:05:22/INF:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/o-ran-supervision.c:122 Expiration of supervision interval 60 (pid = 140704008832576) 2023-10-10 06:05:22/INF:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, Session 3515 (user “root”, CID 28) created. [WRN] Connection with CID 27 is dead. 2023-10-10 06:05:22/WRN:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, Connection with CID 27 is dead.

2023-10-10 06:05:37,184 DEBG ‘orusw’ stdout output: [ERR] Waiting on a conditional variable failed (sr_shmext_notif_sub_stop: Connection timed out). 2023-10-10 06:05:37/ERR:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, Waiting on a conditional variable failed (sr_shmext_notif_sub_stop: Connection timed out). [WRN] Recovering module “o-ran-supervision” notification subscription of CID 27. 2023-10-10 06:05:37/WRN:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, Recovering module “o-ran-supervision” notification subscription of CID 27. orusw: /workspace/Services/3rdParty/sysrepo-2.2.105/src/common.c:2839: sr_rwrelock: Assertion `!rwlock->readers[0] && !rwlock->upgr && (rwlock->writer == cid)’ failed. 2023-10-10 06:05:37/INF:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/o-ran-main.c:232 Handler called for signal Aborted 2023-10-10 06:05:37/INF:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/o-ran-main.c:263 Received SIGSEGV or SIGABRT /opt/UE_APP_OAM/service/oran/orusw() [0x451852] /lib64/libc.so.6(+0x54df0) [0x7ff8a2b4fdf0] /lib64/libc.so.6(+0xa154c) [0x7ff8a2b9c54c] /lib64/libc.so.6(raise+0x16) [0x7ff8a2b4fd46] /lib64/libc.so.6(abort+0xd3) [0x7ff8a2b237f3] /lib64/libc.so.6(+0x2871b) [0x7ff8a2b2371b] /lib64/libc.so.6(+0x4dce6) [0x7ff8a2b48ce6] /opt/UE_APP_OAM/lib64/libsysrepo.so.7(+0x24461) [0x7ff8a351b461] /opt/UE_APP_OAM/lib64/libsysrepo.so.7(+0x54fee) [0x7ff8a354bfee] /opt/UE_APP_OAM/lib64/libsysrepo.so.7(+0x30e07) [0x7ff8a3527e07] /opt/UE_APP_OAM/lib64/libsysrepo.so.7(+0x65ac3) [0x7ff8a355cac3] /opt/UE_APP_OAM/lib64/libsysrepo.so.7(sr_notif_send_tree+0x7a6) [0x7ff8a3512e16] /opt/UE_APP_OAM/service/oran/orusw(monitorCallback+0x113) [0x456240] /opt/UE_APP_OAM/service/oran/orusw(sampleThread+0x179) [0x45641b] /lib64/libc.so.6(+0x9f802) [0x7ff8a2b9a802] 2023-10-10 06:05:37/ERR:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/o-ran-main.c:283 You can check for coredumps in /opt/dev 2023-10-10 06:05:37/INF:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/o-ran-main.c:232 Handler called for signal Aborted

2023-10-10 06:05:37,186 DEBG ‘orusw’ stdout output:

========== DATA FOR “ietf-netconf-monitoring” “/ietf-netconf-monitoring:netconf-state/ietf-netconf-monitoring:capabilities” request_id=1 REQUESTED =======================

========== DATA FOR “ietf-netconf-monitoring” “/ietf-netconf-monitoring:netconf-state/ietf-netconf-monitoring:capabilities” request_id=2 REQUESTED =======================

2023-10-10 06:05:37,188 DEBG fd 19 closed, stopped monitoring <POutputDispatcher at 140189124470816 for <Subprocess at 140189126124640 with name orusw in state RUNNING> (stdout)> 2023-10-10 06:05:37,188 INFO exited: orusw (exit status 1; not expected) 2023-10-10 06:05:37,188 DEBG received SIGCHLD indicating a child quit

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Comments: 24 (13 by maintainers)

Most upvoted comments

Creating short-lived sessions repeatedly is not a problem at all, they are cheap to make, unlike connections.

Let me know when it is public, I can rerun and see how it goes with the changes.

I am making progress but at least a week or 2 are still needed.