OpenSearch: [Regression] Request decompression was broken in 2.11
Describe the bug
I updated OS from 2.9.0 to 2.11.0 and binary information appeared in the indexes:
[2023/10/20 09:53:38] [error] [output:opensearch:opensearch.0] HTTP status=400 URI=/_bulk, response:
{"error":{"root_cause":[{"type":"json_parse_exception","reason":"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r, \\n, \\t) is allowed between tokens\n at [Source: (byte[])\"\\u001F�\\u0008\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000��Vmo�6\\u0010��_a�si�]�1\rs\\u001D%\\u0015�7XJ�n)\\u000C��<.\\u0012����4�\\u007F�Qr\\u001Ags\\u0002\\u000CӇ��{���!�\\u00147t8}\\u0018�\\u0018���t�SQ��7(c\r�)jq��������UT5��\\u0001e\\u0019��L\\u0003YFjL��=����{�\r�\\u000Fw\\u001F��t\\u0017_\\u0000ʇ+��؛L|�0��7݀\\u0016�\"; line: 1, column: 2]"}],"type":"json_parse_exception","reason":"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r, \\n, \\t) is allowed between tokens\n at [Source: (byte[])\"\\u001F�\\u0008\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000��Vmo�6\\u0010��_a�si�]�1\rs\\u001D%\\u0015�7XJ�n)\\u000C��<.\\u0012����4�\\u007F�Qr\\u001Ags\\u0002\\u000CӇ��{���!�\\u00147t8}\\u0018�\\u0018���t�SQ��7(c\r�)jq��������UT5��\\u0001e\\u0019��L\\u0003YFjL��=����{�\r�\\u000Fw\\u001F��t\\u0017_\\u0000ʇ+��؛L|�0��7݀\\u0016�\"; line: 1, column: 2]"},"status":400}
We use fluent-bit and it has a compression option: Compress gzip. Turning it off solved the problem. However, we can’t permanently disable it because we have a lot of traffic and need to reduce its cost.
Expected behavior Compression should work on version 2.11.0.
Plugins
opensearch-alerting 2.11.0.0
opensearch-anomaly-detection 2.11.0.0
opensearch-asynchronous-search 2.11.0.0
opensearch-cross-cluster-replication 2.11.0.0
opensearch-custom-codecs 2.11.0.0
opensearch-geospatial 2.11.0.0
opensearch-index-management 2.11.0.0
opensearch-job-scheduler 2.11.0.0
opensearch-knn 2.11.0.0
opensearch-ml 2.11.0.0
opensearch-neural-search 2.11.0.0
opensearch-notifications 2.11.0.0
opensearch-notifications-core 2.11.0.0
opensearch-observability 2.11.0.0
opensearch-performance-analyzer 2.11.0.0
opensearch-reports-scheduler 2.11.0.0
opensearch-security 2.11.0.0
opensearch-security-analytics 2.11.0.0
opensearch-sql 2.11.0.0
repository-s3 2.11.0
Host/Environment (please complete the following information):
- OS: Azure K8s Service (AKS) v.1.26.6 with Ubuntu nodes v.22.04
- Version: 2.11.0
About this issue
- Original URL
- State: closed
- Created 8 months ago
- Reactions: 1
- Comments: 18 (10 by maintainers)
Commits related to this issue
- Use new instance of Decompressor on channel initialization (#3583) ### Description Resolves an issue with decompression that can lead to concurrent gzipped requests failing. This removes the `@Sh... — committed to opensearch-project/security by cwperks 8 months ago
- Use new instance of Decompressor on channel initialization (#3583) ### Description Resolves an issue with decompression that can lead to concurrent gzipped requests failing. This removes the `@Shara... — committed to opensearch-project/security by github-actions[bot] 8 months ago
- Use new instance of Decompressor on channel initialization (#3583) ### Description Resolves an issue with decompression that can lead to concurrent gzipped requests failing. This removes the `@Shara... — committed to opensearch-project/security by github-actions[bot] 8 months ago
Thank you for providing the configuration @kinseii. I was able to reproduce the issue and found the likely culprit. In 2.11 a change was made to keep unauthenticated request bodies compressed to avoid a tax of decompressing the request bodies. In https://github.com/opensearch-project/security/pull/3418, the decompressor is replaced with a subclass of the Netty
HttpContentDecompressorthat overrides the content encoding if the request body should remain compressed. The Decompressor in that PR adds a@Sharableannotation and uses the same instance in multiple channels, but should not since its a stateful handler.Thank you for reporting this issue.
Link to PR to address the issue: https://github.com/opensearch-project/security/pull/3583
I have to admit that I am amazed at how complex, confusing and most importantly time consuming the workflow is and this coupled with a critical issue. I’m very grateful to the team for the product, but I would like to see critical issues resolved a little faster in the future)
I think this would be the one suspect as well https://github.com/opensearch-project/OpenSearch/pull/10261, thanks @peternied
@kinseii thank you for reporting, @nknize I have only this suspect at the moment https://github.com/opensearch-project/OpenSearch/pull/9367