libzmq: assertion failure in mailbox.cpp:82
Environment
- OS - MS Windows 7 Ultimate SP1,
- Compiler - Visual Studio 2013
- Version of ZeroMQ - 4.0.4, both variants: linking compiled dll and compile ZMQ source code.
- We checked previous stable version 3.2.4 and development master on github - the same problem
How to reproduce
The C++ application is running and in several hours zmq_assert raises an exception in zmq::mailbox_t::recv line 82 and the application is crashed. Timeout isn’t constant, could be 4 hours and more. It occurs even if the application has not had any zeromq tcp connections.
We found that ZeroMQ creates internal tcp connection (several sockets). It passes signals between zeromq threads or something else. But this is other sockets than for publisher-subsribers connection.
The exception raises when a disconnect event occurs in this internal tcp connection (got it in an network sniffer). We didn’t find who initiated disconnect; we didn’t find that ZeroMQ calls socket closing before.
The disconnect event changes the status of socket for WinSock select() method, it generates an read operation:
void zmq::select_t::loop ()
{
. . .
if (FD_ISSET (fds [i].fd, &readfds))
fds [i].events->in_event (); ß line 197
. . .
}
http://msdn.microsoft.com/en-us/library/windows/desktop/ms740141(v=vs.85).aspx
readfds:
- If listen has been called and a connection is pending, accept will succeed.
- Data is available for reading (includes OOB data if SO_OOBINLINE is enabled).
- Connection has been closed/reset/terminated.
Then ZeroMQ is trying to read data from socket:
void zmq::io_thread_t::in_event ()
{
. . .
command_t cmd;
int rc = mailbox.recv (&cmd, 0); ß line 69
. . .
}
And it raises an exception on checks after reading because there is nothing to read:
int zmq::mailbox_t::recv (command_t *cmd_, int timeout_)
{
. . .
// Get a command.
errno_assert (rc == 0);
bool ok = cpipe.read (cmd_);
zmq_assert (ok); ß line 82 (crashed here)
return 0;
}
About this issue
- Original URL
- State: closed
- Created 10 years ago
- Comments: 41 (17 by maintainers)
Commits related to this issue
- Don't delay reception of signal - new code may help undersdtand issue #1108 (https://github.com/zeromq/libzmq/issues/1108) - code cleanups — committed to hurtonm/libzmq by hurtonm 10 years ago
I can reproduce this error if I accidentally call zmsg_send against socket from one thread while calling zmq_poll against the same socket from another thread. According the ZMQ documentation, this is not currently allowed. This is an easy mistake to make if you expect your application to call send() from the primary application thread and have another asynchronous thread handling receive(). If you need to create an asynchronous send and receive, then you should perform both operations in the same thread, first zmq_poll() with a low timeout (5 or 10ms) followed by all your zmsq_send operations that are pending in a single pass. This should be a queue that is protected so that your main application thread can write it and the receive thread can read it.
Additionally you need to make sure you don’t create or destroy your sockets in the constructor for your class, but instead in the context of the thread. That way the socket will be entirely managed by the thread. I have found this approach eliminates the mailbox.cpp error.
If anyone knows another model that works for asynchronous handling in ZeroMQ that is thread safe, I would love to hear it.