security: [BUG] [CI] Investigate Flaky test failure for windows CI tasks
What is the bug?
Windows CI has been flaky with recent runs for tasks, specifically citest, dlicRestApiTest.
See these runs for example:
https://github.com/opensearch-project/security/actions/runs/5577190458/attempts/1 <-- 3 tasks failed https://github.com/opensearch-project/security/actions/runs/5577190458/attempts/2 <-- 2 tasks failed (1 prev failed task passed) https://github.com/opensearch-project/security/actions/runs/5577190458/attempts/3 <-- 1 task failed https://github.com/opensearch-project/security/actions/runs/5577190458/attempts/5 <-- all tasks passed
Failure Details
| (citest, windows-latest, 11) | (citest, windows-latest, 17) | |
|---|---|---|
| 1st run | - org.opensearch.security.SecurityAdminTests.testSecurityAdmin - org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testRestApiRolesEnabled - org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testComplianceEnable - org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testRestApiRolesDisabled - org.opensearch.security.multitenancy.test.MultitenancyTests.testTenantParametersSubstitution - org.opensearch.security.auditlog.integration.BasicAuditlogTest.testDeleteByQuery - org.opensearch.security.httpclient.HttpClientTest.testPlainConnection - org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testComplianceEnable - org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testRestApiRolesEnabled |
- com.amazon.dlic.auth.http.saml.HTTPSamlAuthenticatorTest.initialConnectionFailureTest - com.amazon.dlic.auth.ldap2.LdapBackendIntegTest2.testIntegLdapAuthenticationSSL - org.opensearch.security.SecurityAdminTests.testIsLegacySecurityIndexOnV7Index - org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testInternalConfig - org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testComplianceEnable - org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testInternalConfig - org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testBCryptHashRedaction - org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testRestApiRolesDisabled - org.opensearch.security.multitenancy.test.TenancyPrivateTenantEnabledTests.testPrivateTenantDisabled_Update_EndToEnd - org.opensearch.security.auditlog.integration.BasicAuditlogTest.testSensitiveMethodRedaction - org.opensearch.security.auditlog.integration.BasicAuditlogTest.testScroll - org.opensearch.security.auditlog.integration.BasicAuditlogTest.testScroll - org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testBCryptHashRedaction |
| 2nd run | - org.opensearch.security.SecurityAdminTests.testSecurityAdminRegularUpdate - org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testInternalConfig - org.opensearch.security.auditlog.integration.BasicAuditlogTest.testDeleteByQuery - org.opensearch.security.multitenancy.test.TenancyMultitenancyEnabledTests.testMultitenancyDisabled_endToEndTest - org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testInternalConfig - org.opensearch.security.auditlog.integration.BasicAuditlogTest.testDeleteByQuery |
- com.amazon.dlic.auth.ldap2.LdapBackendIntegTest2.testAttributesWithImpersonation - org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testWriteHistory - org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testBCryptHashRedaction - org.opensearch.security.multitenancy.test.MultitenancyTests.testTenantParametersSubstitution - org.opensearch.security.multitenancy.test.MultitenancyTests.testMt - org.opensearch.security.multitenancy.test.MultitenancyTests.testMtMulti - org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testBCryptHashRedaction |
| 3rd run | - org.opensearch.security.SecurityAdminTests.testSecurityAdmin - org.opensearch.security.SecurityAdminTests.testSecurityAdminInvalidCert - org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testRestApiRolesEnabled - org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testRestApiNewUser - org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testUpdate - org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testInternalConfig - org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testRestApiRolesEnabled - org.opensearch.security.auditlog.integration.BasicAuditlogTest.testDeleteByQuery - org.opensearch.security.multitenancy.test.MultitenancyTests.testMt - org.opensearch.security.auditlog.integration.BasicAuditlogTest.testDeleteByQuery - org.opensearch.security.multitenancy.test.TenancyMultitenancyEnabledTests.testMultitenancyDisabled_endToEndTest - org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testRestApiNewUser - org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testInternalConfig - org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testUpdate |
Test Methods with failures across all mentioned runs
- org.opensearch.security.SecurityAdminTests.testSecurityAdmin
- org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testRestApiRolesEnabled
- org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testComplianceEnable
- org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testRestApiRolesDisabled
- org.opensearch.security.multitenancy.test.MultitenancyTests.testTenantParametersSubstitution
- org.opensearch.security.auditlog.integration.BasicAuditlogTest.testDeleteByQuery
- org.opensearch.security.httpclient.HttpClientTest.testPlainConnection
- com.amazon.dlic.auth.http.saml.HTTPSamlAuthenticatorTest.initialConnectionFailureTest
- com.amazon.dlic.auth.ldap2.LdapBackendIntegTest2.testIntegLdapAuthenticationSSL
- org.opensearch.security.SecurityAdminTests.testIsLegacySecurityIndexOnV7Index
- org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testInternalConfig
- org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testBCryptHashRedaction
- org.opensearch.security.multitenancy.test.TenancyPrivateTenantEnabledTests.testPrivateTenantDisabled_Update_EndToEnd
- org.opensearch.security.auditlog.integration.BasicAuditlogTest.testSensitiveMethodRedaction
- org.opensearch.security.auditlog.integration.BasicAuditlogTest.testScroll
- org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testBCryptHashRedaction
- org.opensearch.security.multitenancy.test.MultitenancyTests.testMt
- org.opensearch.security.multitenancy.test.MultitenancyTests.testMtMulti
- org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testWriteHistory
- org.opensearch.security.auditlog.compliance.RestApiComplianceAuditlogTest.testRestApiNewUser
- org.opensearch.security.auditlog.compliance.ComplianceAuditlogTest.testUpdate
- org.opensearch.security.SecurityAdminTests.testSecurityAdminInvalidCert
- org.opensearch.security.multitenancy.test.TenancyMultitenancyEnabledTests.testMultitenancyDisabled_endToEndTest
What is the expected behavior? No flakiness.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 22 (18 by maintainers)
I think so too, I was not able to replicate the issue on the RC.
That being said I think the code path here in the SecurityInterceptor and the corresponding code in the receiver is dead code. In reality, the user is not serialized on local node requests because it takes the bypass route described here
I think I nailed it folks:
We run 3 nodes in single JVM, with 3 security plugins, only one of them wins. @DarshitChanpura I think at this moment - this is 100% test related problem that should not leak into production.
In doing more local testing, I think I see an identity crisis happening where core thinks the local node is node1 and the security plugin thinks its node2. I’m looking into how this can happen now. Both the security plugin and core should be aware of node they are running on without any conflicts.
I added logging statements in
TransportService.getConnectionand then again insideSecurityInterceptor.sendRequestDecoratewhich is called directly aftergetConnectionon remote node requests to determine that there was a discrepancy between what core thought the localNode was and what the security plugin thought the localNode was.@DarshitChanpura it looks like there were a few fixes put into main for windows support that may not have been backported to 2.x.
Particularly this one: https://github.com/opensearch-project/security/pull/2180
There is a list of issues that @peternied added to this PR description: https://github.com/opensearch-project/security/pull/2291 - it may be best to backport the one’s of those that can be backported