accumulo: Thrift frame size errors
We are seeing the following message continuously on a tserver:
Read a frame size of 1458446612, which is bigger than the maximum allowable buffer sized for ALL connections.
I realized that #3042 reported the same thing against 2.1. However this is against 1.10.2, and this does not appear to be because of any port scanning or other activity as such. The reason I say that only 1 tserver is reporting this, and as soon as I drop that tserver the errors move to another tserver. Hence it is somehow tied to a tablet.
After increasing the logging on various thing, it seem correlated with opening connections to tservers that are only handling the accumulo.metadata table but I do not have absolute proof of this.
Version 1.10.2 of accumulo CentOS 7.5 Java Corretto-8.302.,08 (1.8.0_302-b08) Hadoop 3.3.3 (vanilla) Zookeeper 3.7.1
The error comes from the org.apache.thrift.server.AbstractNonblockingServer which is extended by the org.apache.accumulo.server.rpc.CustomNonBlockingServer. The error is in the read() method of the internal FramBuffer class.
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 17 (17 by maintainers)
Commits related to this issue
- Make sure maxMessageSize is correctly set when Thrift is configured to use a non blocking server TServerUtils was not correctly setting the max frame size value on the constructor when creating a TNo... — committed to cshannon/accumulo by cshannon 2 years ago
- Make sure maxMessageSize is correctly set when Thrift is configured to use a non blocking server TServerUtils was not correctly setting the max frame size value on the constructor when creating a TNo... — committed to cshannon/accumulo by cshannon 2 years ago
https://github.com/apache/accumulo/pull/3103 addresses the null logging issue from the original patch in #3047
The problem is the server is nonblocking so different threads can process the same FrameBuffer. When I tested and increased logging I saw different threads calling read() and invoke() on the same buffer so capturing the client address in invoke() and storing in a ThreadLocal doesn’t work. The new PR just captures the client address on FrameBuffer creation and stores it in a String format for easy logging as a new FrameBuffer is allocated for each request.
With this new patch the logging should now work correctly and not return null.