security: [BUG] Errors/Broken operations during rolling upgrade of clusters from 1.3 to 2.0
What is the bug? Errors/Broken search results during rolling upgrade of clusters from 1.3 to 2.0
How can one reproduce the bug?
- Create 1.3 cluster with atleast 2 nodes
- Create index with 2 primaries to allocate atleast one primary per node.
- Upgrade one of the node to 2.0 OS version.
- Invoke search query to invoke search on all the shards from 1.3 node.
- See that there are failures to execute the search on this request
"took" : 40,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 1,
"skipped" : 0,
"failed" : 1,
"failures" : [
{
"shard" : 1,
"index" : "test-index",
"node" : "O7kxX-lMTAKvXBj91-LQ8Q",
"reason" : {
"type" : "exception",
"reason" : "java.lang.ClassNotFoundException: com.amazon.opendistroforelasticsearch.security.user.User",
"caused_by" : {
"type" : "class_not_found_exception",
"reason" : "class_not_found_exception: com.amazon.opendistroforelasticsearch.security.user.User"
}
}
}
]
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
".....
}
}
}
]
Notice that the failed shard count is 1
What is the expected behavior? Rolling upgrade of clusters should complete without any issues.
What is your host/environment?
- OS: 1.3 to OS: 2.0 upgrade
- Plugins: Security
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 28 (19 by maintainers)
@ronniepg Understood and that’s what this PR is targeting: https://github.com/opensearch-project/security/pull/2268
There is logic in 1.3 to keep backwards compatibility with ODFE that will always rewrite package names from
org.opensearchtocom.amazon.opendistroforelasticsearchso that when messages are picked up by ODFE nodes that they are able to understand the message. That’s a problem when you are going from OS 1 to OS 2 because OS 2 does not understand thecom.amazon.opendistroforelasticsearchpackages. The PR above aims to conditionally apply the serialization logic if there are ODFE nodes in the cluster. If you have only OS 1 nodes and going to OS 2, it should not be performing the package rewrite on serialization for the transport action.@peternied I will look into this today and see if there’s a possibility of getting the min node version from the ClusterInfoHolder to conditionally apply the serialization logic that replaces the package name with opendistro package name.
If the min node in the cluster is OS 1, then no need to perform the rewrite logic.
If the min node in the cluster is ODFE, then apply the rewrite logic.