OpenSearch: [BUG] repository-azure plugin hangs in OpenSearch >= 1.2.0

Describe the bug Since 1.2.0, the repository-azure plugin stop working correctly. The PUT command to create the new repository hanging forever and the thread pool queue is filling up with 120 generic tasks and the master node is eating all the cpu resources it got:

"CNLPL4MfQ1aOeA1io2LXKw:44940" : {
    "node" : "CNLPL4MfQ1aOeA1io2LXKw",
    "id" : 44940,
    "type" : "transport",
    "action" : "cluster:admin/snapshot/get",
    "start_time_in_millis" : 1639552118629,
    "running_time_in_nanos" : 205051397113,
    "cancellable" : false,
    "parent_task_id" : "uY6TEyVlSQCxiJxkMJq6Sg:10583",
    "headers" : { }
  },

Nothing is logged. Is there anyway to enable debug logging on plugins?

Also, if you look at transactions/sec metrics in the azure storage account, there is thousands of them: image

To Reproduce Steps to reproduce the behavior:

  1. Add Azure Storage Account info (name and sas token) in keystore azure.client.default.account azure.client.default.sas_token

  2. Create the snapshot repository.

PUT _snapshot/azure
{
  "type": "azure",
  "settings": {
    "client": "default",
    "container": "opensearch"
    "base_path": "subfolder"
  }
}

This should hangs forever. 3. See the thread pool or running tasks

GET /_cat/thread_pool
GET _tasks

Expected behavior

{
  "acknowledged" : true
}

Plugins

  • repository-azure

Host/Environment (please complete the following information):

  • opensearch 1.2.1 docker image running in Kubernetes

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 7
  • Comments: 24 (10 by maintainers)

Commits related to this issue

Most upvoted comments

@reta @uncycler @juntezhang care to confirm that @reta’s fix works in https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/1.2.2/102/linux/x64/builds/opensearch/core-plugins/repository-azure-1.2.2.zip, this does not have a version increment, we’re going to do this and go to 1.2.3.

I confirm that the updated plugin is working as expected.

Confirmed that 1.2.3 produces the correct behavior for me:

./bin/opensearch-plugin install repository-azure
./bin/opensearch-keystore add azure.client.default.account
./bin/opensearch-keystore add azure.client.default.key
./bin/opensearch -d -p opensearch.pid
/usr/bin/curl http://localhost:9200/songs/_doc -X POST -H 'Content-Type: application/json' -d '{"title": "Inside Out", "artist": "Eve 6"}'
/usr/bin/curl http://localhost:9200/songs/_doc -X POST -H 'Content-Type: application/json' -d '{"title": "Semi-Charmed Life", "artist": "Third Eye Blind"}'
/usr/bin/curl http://localhost:9200/_snapshot/testbackup -X PUT -H 'Content-Type: application/json' -d '{"type": "azure", "settings": {"container": "testbackup"}}'
{"acknowledged":true}
/usr/bin/curl http://localhost:9200/_snapshot/testbackup/1 -X PUT  
{"accepted":true}

azure

Transactions / sec is back to an expected range: tps

@nknize @dblock do you think guys it is worth extracting the repository plugins off the main repo? (it looks doable in general)

I was going to suggest that. There’s no reason for these plugins to be tied to OpenSearch IMO. Appreciate if you could open an issue either way.

Let’s talk about a release for this in opensearch-project/opensearch-build#1365? Will a release of just the plugin with version 1.2.2.1 work?

@dblock I think technically it will work, but from the code perspective, it will go to 1.2 branch, could we track the release of the plugin to particular commit? (wondering how we could match binary and source artifacts since it is the same repository).

We’ll increment the version and make a tag like we always do. 1.2 is just the line for all the 1.2.x releases.

@PaulLesur @juntezhang so the issue is closely related to https://github.com/FasterXML/jackson-databind/issues/3322 and in the nutshell, Azure Blob APIs V12 heavily relies on the fact that empty XML elements / attributes are going to be nullified.

However, sadly, it highly depends on XMLInputReader instance being picked up at runtime: the Woodstox does that, whereas the default one from JDK com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl does not. It leads to infinite loop within listBlobsByHierarchy or listBlobs - the page navigation only understands null as termination condition.

Working on the fix now.

@reta there are no exceptions logged by OpenSearch. It just hangs.

@juntezhang looking into it