libzmq: PUB crash when SUB exceeded SNDHWM
Please use this template for reporting suspected bugs or requests for help.
Issue description
When all of these conditions are satisfied, the assertion failure from mtrie.cpp occurs:
- A connection between a
PUBsocket and manySUBsockets. - A
SUBsocket subscribe/unsubscribe many prefixes. - Call
zmq_getsockopt()withZMQ_EVENTSforSUBsockets.
Assertion failed: erased == 1 (src/mtrie.cpp:297)
[1] 30266 abort (core dumped) ./a.out
Environment
- libzmq version (commit hash if unreleased): 4.2.0 and 4.2.3
- OS: Ubuntu 16.04 LTS
Minimal test code / Steps to reproduce the issue
To reproduce this crash, we should prepare a PUB socket and many SUB sockets.
We will call this sequence (pseudo-code): pub.connect(sub) or sub.connect(pub); pub.getsockopt(ZMQ_EVENTS); sub.subscribe(prefix); sub.getsockopt(ZMQ_EVENTS); sub.unsubscribe(prefix); sub.getsockopt(ZMQ_EVENTS). There will be many prefixes to subscribe/unsubscribe.
Calling getsockopt(ZMQ_EVENTS) after SUB’s SUBSCRIBE/UNSUBSCRIBE, or PUB’s zmq_connect() will produce a crash due to the assertion failure in mtrie_t::rm_helper.
You can switch PUB<->SUB connection topology by the pub_to_sub variable.
#include "zmq.h"
#include <stdio.h>
// Set 1 or 0 to switch the PUB<->SUB connection topology.
static int pub_to_sub = 1;
void gen_topic(int n, char* topic)
{
// Simple hash function to generate a subscription prefix from a number.
n = (n * 2654435761);
sprintf(topic, "%08x", n);
}
void getsockopt_events_within_many_subscriptions(void* sub)
{
char topic[8];
char opt[256];
size_t opt_len = 256;
for (int j = 0; j < 10000; ++j)
{
gen_topic(j, topic);
zmq_setsockopt(sub, ZMQ_SUBSCRIBE, &topic, 8);
// CRASH: Get ZMQ_EVENTS from a SUB socket.
zmq_getsockopt(sub, ZMQ_EVENTS, opt, &opt_len);
}
for (int j = 0; j < 10000; ++j)
{
gen_topic(j, topic);
zmq_setsockopt(sub, ZMQ_UNSUBSCRIBE, &topic, 8);
// CRASH: Get ZMQ_EVENTS from a SUB socket.
zmq_getsockopt(sub, ZMQ_EVENTS, opt, &opt_len);
}
}
int main()
{
printf("%d.%d.%d\n", ZMQ_VERSION_MAJOR, ZMQ_VERSION_MINOR, ZMQ_VERSION_PATCH);
void *context = zmq_ctx_new();
void *pub = zmq_socket(context, ZMQ_PUB);
void *sub;
char addr[256]; size_t addr_len = 256;
char opt[256]; size_t opt_len = 256;
if (pub_to_sub)
{
// PUB->SUB
for (int i = 0; i < 100; ++i)
{
sub = zmq_socket(context, ZMQ_SUB);
zmq_bind(sub, "tcp://127.0.0.1:*");
zmq_getsockopt(sub, ZMQ_LAST_ENDPOINT, addr, &addr_len);
zmq_connect(pub, addr);
getsockopt_events_within_many_subscriptions(sub);
}
}
else
{
// SUB->PUB
zmq_bind(pub, "tcp://127.0.0.1:*");
zmq_getsockopt(pub, ZMQ_LAST_ENDPOINT, addr, &addr_len);
for (int i = 0; i < 100; ++i)
{
sub = zmq_socket(context, ZMQ_SUB);
zmq_connect(sub, addr);
getsockopt_events_within_many_subscriptions(sub);
// CRASH: Get ZMQ_EVENTS from the PUB socket.
zmq_getsockopt(pub, ZMQ_EVENTS, opt, &opt_len);
}
}
}
What’s the actual result? (include assertion message & call stack if applicable)
$ gcc zmq_events_crash.c -L ~/usr/local/lib -lzmq && ./a.out
4.2.3
Assertion failed: erased == 1 (src/mtrie.cpp:297)
[1] 30266 abort (core dumped) ./a.out
What’s the expected result?
$ gcc zmq_events_crash.c -L ~/usr/local/lib -lzmq && ./a.out
4.2.3
$ echo $?
0
When SUB sockets connect to the PUB socket, this crash doesn’t happen.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 60 (29 by maintainers)
Commits related to this issue
- Remove PUB/SUB SIGABRT test case It is libzmq issue. PyZMQ doesn't have a responsibility to fix it. The issue is reported at https://github.com/zeromq/libzmq/issues/2942. — committed to what-studio/pyzmq by sublee 6 years ago
- Problem: no mention of #2942 in NEWS Solution: add it — committed to zeromq/libzmq by bluca 6 years ago
@bluca Maybe I find some time tomorrow to add sufficient tests, so that we can discuss consistency of mtrie behaviour.
At the moment my impression is that the assertion is too strict within mtrie, but it may well be worth an assertion at the call site. I did not dig into the larger picture yet.
I renamed the branch 😉