evennia: [BUG] Corrupt log locks server
Describe the bug
This is related to the old bug where the AMP sends garbage data in a loop between the portal and server. It seems when this information is dumped to the server.log, it locks down the entire process: the server can no longer be gracefully shut down, so it requires a kill -9 to stop every time. Interestingly, even trying to tail the log can freeze up a shell sometimes, at least when ssh’d in on MacOS. Using file on the log says it’s now a binary rather than text file.
Example output:
0-02-10 08:52:16-05 [::] return Int16StringReceiver.dataReceived(self, data)
20-02-10 08:52:16-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/basic.py", line 766, in dataReceived
20-02-10 08:52:16-05 [::] self.stringReceived(packet)
20-02-10 08:52:16-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/basic.py", line 878, in stringReceived
20-02-10 08:52:16-05 [::] self.state = statehandler(string)
20-02-10 08:52:16-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/amp.py", line 2459, in proto_key
20-02-10 08:52:16-05 [::] self.boxReceiver.ampBoxReceived(self._currentBox)
20-02-10 08:52:16-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/amp.py", line 1058, in ampBoxReceived
20-02-10 08:52:16-05 [::] self._answerReceived(box)
20-02-10 08:52:16-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/amp.py", line 984, in _answerReceived
20-02-10 08:52:16-05 [::] question = self._outstandingRequests.pop(box[ANSWER])
20-02-10 08:52:16-05 [::] KeyError: b'5825'
20-02-10 08:52:16-05 [EE] Discarded incoming partial data: _ask3b3_commandMsgPortal2Server
20-02-10 08:52:16-05 [EE] packed_dataFxÚk`�jÁ¾�BµSzXJR+J¦ÄNÑ�²3SrR§$ÖNÑèaMÎMÉL�âÍÖÃ�_P���W<¥vJijqÛ=²Yc
20-02-10 08:52:17-05 [::] Traceback (most recent call last):
20-02-10 08:52:17-05 [::] File "/home/tehom/arx/evennia/evennia/server/portal/amp.py", line 315, in dataReceived
20-02-10 08:52:17-05 [::] super(AMPMultiConnectionProtocol, self).dataReceived(data)
20-02-10 08:52:17-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/amp.py", line 2406, in dataReceived
20-02-10 08:52:17-05 [::] return Int16StringReceiver.dataReceived(self, data)
20-02-10 08:52:17-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/basic.py", line 766, in dataReceived
20-02-10 08:52:17-05 [::] self.stringReceived(packet)
20-02-10 08:52:17-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/basic.py", line 878, in stringReceived
20-02-10 08:52:17-05 [::] self.state = statehandler(string)
20-02-10 08:52:17-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/amp.py", line 2459, in proto_key
20-02-10 08:52:17-05 [::] self.boxReceiver.ampBoxReceived(self._currentBox)
20-02-10 08:52:17-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/amp.py", line 1058, in ampBoxReceived
20-02-10 08:52:17-05 [::] self._answerReceived(box)
20-02-10 08:52:17-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/amp.py", line 984, in _answerReceived
20-02-10 08:52:17-05 [::] question = self._outstandingRequests.pop(box[ANSWER])
20-02-10 08:52:17-05 [::] KeyError: b'5825'
20-02-10 08:52:17-05 [EE] Discarded incoming partial data: _ask3b3_commandAdminPortal2Server
20-02-10 08:52:17-05 [EE] packed_data
20-02-10 08:52:17-05 [EE] xÚM�ÏJÃ@Æ[�6�¶">
20-02-10 08:52:17-05 [EE] -bí1MB(êF¢E/Öì¶·Ù�Ý�a
20-02-10 08:52:17-05 [EE] ÆgóuÜþQ:��o~³3óÞþ~j66q39yÃSpdÉ+ªsY 4��+e£�âÚÐ++©e&EúÂ_:���k�2V/0
20-02-10 08:52:17-05 [EE] ¹ÃáØ=w£
20-02-10 08:52:17-05 [EE] KÕ9C�M+�æ'°³� ÐÚÒv]Ð%7�#äbÁY�ø V¹¥N&�"Õ¹qDÞÏ×£åù}8Î�,Tét�«üYì³îÛ«9ë���
20-02-10 08:52:17-05 [EE] ¼jÀÑÿ"sAj³\7$~
20-02-10 08:52:17-05 [EE] LI�f
20-02-10 08:52:17-05 [EE] =?»DèÝùI�$ô�01
20-02-10 08:52:17-05 [EE] LÉíì'³È¨Vâ=�Ç&±_Ç�×ë�U+^¥»�Õõº¿�3�Þ
20-02-10 08:52:18-05 [::] Traceback (most recent call last):
20-02-10 08:52:18-05 [::] File "/home/tehom/arx/evennia/evennia/server/portal/amp.py", line 315, in dataReceived
20-02-10 08:52:18-05 [::] super(AMPMultiConnectionProtocol, self).dataReceived(data)
20-02-10 08:52:18-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/amp.py", line 2406, in dataReceived
20-02-10 08:52:18-05 [::] return Int16StringReceiver.dataReceived(self, data)
20-02-10 08:52:18-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/basic.py", line 766, in dataReceived
20-02-10 08:52:18-05 [::] self.stringReceived(packet)
20-02-10 08:52:18-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/basic.py", line 878, in stringReceived
20-02-10 08:52:18-05 [::] self.state = statehandler(string)
20-02-10 08:52:18-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/amp.py", line 2459, in proto_key
20-02-10 08:52:18-05 [::] self.boxReceiver.ampBoxReceived(self._currentBox)
20-02-10 08:52:18-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/amp.py", line 1058, in ampBoxReceived
20-02-10 08:52:18-05 [::] self._answerReceived(box)
20-02-10 08:52:18-05 [::] File "/home/tehom/arx/arxvenv/lib/python3.8/site-packages/twisted/protocols/amp.py", line 984, in _answerReceived
20-02-10 08:52:18-05 [::] question = self._outstandingRequests.pop(box[ANSWER])
20-02-10 08:52:18-05 [::] KeyError: b'5825'
20-02-10 08:52:18-05 [EE] Discarded incoming partial data: _ask3b3_commandAdminPortal2Server
20-02-10 08:52:18-05 [EE] packed_dataTxÚ=POOÂ0
20-02-10 08:52:18-05 [EE]
20-02-10 08:52:18-05 [EE]òoxÐ/áÅ%9�®n
20-02-10 08:52:18-05 [EE] [»�"x"
20-02-10 08:52:18-05 [EE] ▒��l��OF�
20-02-10 08:52:18-05 [EE] <ÔoàwòëØÚËû½_ßï½ä½V¿¾Ë¥Ý
20-02-10 08:52:18-05 [EE] �§/ê\6ø6Íb±â▒��<ÍsÍ�XĬmÆ_ðõü!}VòX¤ëM*�¬ÅI�i�íÎ ow»ûÒîô{Z²³H�N�µÅ�ÙXðÍf.V�©rÁÏç¬`[�ü»/×ñ}¾K¬#©��«dµË�k%1�"�Ô�lbN���W³L5Ô
20-02-10 08:52:18-05 [EE] Ô§æ¿A¨K�§åê]¶öÇSì0O�J#�ÿ9»
20-02-10 08:52:18-05 [EE] ÛoÏrÙ¾¡D"�� Ò·UÆîB3Tc��¡É¤th �f�
20-02-10 08:52:18-05 [EE] ��o�º>c(
20-02-10 08:52:18-05 [EE] ºW==7¡�aHwRU²b�s
20-02-10 08:52:18-05 [EE] ®
20-02-10 08:52:18-05 [EE] S|ö�fóCñEñ¦ì_�T�§
To Reproduce
Steps to reproduce the behavior:
- Random error in AMP 2.try to shut down server from console
- Command freezes
- Have to kill -9 the process
Expected behavior
The game to handle the AMP error gracefully and not cause the process to lock up.
Environment, Evennia version, OS etc
If unsure, run evennia -v or get the first few lines of the about command in-game.
Additional context
Getting rid of the discarded partial data bit might solve it, since that seems like it could be what’s screwing up the log. Alternately, making shutdown not write to the log so that it can be independent of a buggy log might work, or wrapping logging during shutdown in a deferred with an errback, maybe? Something like that.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 34 (22 by maintainers)
Commits related to this issue
- Do not try to log partial binary data content, see #2053 — committed to evennia/evennia by Griatch 4 years ago
- Handle AMP issue. Resolve #2053. — committed to evennia/evennia by Griatch 2 years ago
- Handle AMP issue. Resolve #2053. — committed to evennia/evennia by Griatch 2 years ago
This is the tested and working version of my suggestion to manually clean the buffer: https://pastebin.com/jsuEpTKN
The downside is that because it interacts with lower layers of twisted infrastructure directly, it may break if Twisted ever updates that code and you update Twisted.
Thanks a lot for the discussion and help to figure this one out (or at least mitigate it)!
I have still not been able to determine exactly what causes this error at its core, but based on @Zironic’s work, I did test and incorporate a patch for Evennia 1.0-dev (develop branch). I took the slightly different route of replacing
stringReceivedon theAMPMultiConnectionPrototocol(adding atry...except KeyErrorinside it) - this meant copying less code out of the twisted library and seems to work with the added benefit of not disconnecting the server from the portal at any time - so it should not run into Tehom’s extended error above. Using the example code to trigger this, this solution seems to work correctly, but I’ve never been able to reproduce the original error … so if you try outdevelopand find that this solution still has issues, comment to re-open this again.So I can sort-of replicate this behavior the brute-force way, by modifying Twisted itself (if someone’s interested in playing around with it):
In
twisted/protocols/amp.py, line 976:So there is a 1% chance the error emerges, and once it does, it will keep failing. It’s crude but it mimics the behavior and shows a few things:
amp.py’sAMPMultiConnectionProtocol.dataReceivedthat re-raises the KeyError if there are more than X errors raised in a row. This resets the connection and stops the failing loop, at least in this case (since it’s a new protocol instance with only 1% chance of failing in my test).