server: Custom Repository Agent never receiving TRITONREPOAGENT_ModelAction of type TRITONREPOAGENT_ACTION_LOAD_COMPLETE

Description I have a custom repository agent called LoadCheckAgent.cpp. It is properly exported to a .so library and added to the config.pbtxt of models I am using. When a load request is sent to triton for a model using TRITONSERVER_ServerLoadModel(server_, name); , the repository agent TRITONREPOAGENT_ModelAction function is properly called, I have debug output within the agent outputting “AGENT CHECK” on entry to the function. If the action type is TRITONREPOAGENT_ACTION_LOAD, the repo agent is asked to output “MODEL LOAD - REPO”, which is seen happening during runtime.

When the model is finished loading triton will output a success to terminal:

I0927 18:27:31.167871 51235 model_lifecycle.cc:815] successfully loaded 'model_name'

However the repository agent TRITONREPOAGENT_ModelAction is not called again, and no TRITONREPOAGENT_ACTION_LOAD_COMPLETE ever is received.

Additionally, if a unload request is then sent using

TRITONSERVER_ServerUnloadModelAndDependents(server_, name);

the behavior begins to display further issues. The following message is outputted by triton after requesting to unload:

E0927 18:27:37.661673 51235 model_lifecycle.cc:409] Agent model returns error on TRITONREPOAGENT_ACTION_UNLOAD: Internal: Unexpected lifecycle state transition from TRITONREPOAGENT_ACTION_LOAD to TRITONREPOAGENT_ACTION_UNLOAD
I0927 18:27:37.662367 51235 onnxruntime.cc:2754] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0927 18:27:37.673127 51235 onnxruntime.cc:2682] TRITONBACKEND_ModelFinalize: delete model state
I0927 18:27:37.673711 51235 model_lifecycle.cc:608] successfully unloaded 'model_name' version 1

This is immediately followed by the agent repo TRITONREPOAGENT_ModelAction being called, and it outputs it’s debugging messages

AGENT CHECK
MODEL LOAD FAILED - REPO

The second of which only being outputted if the TRITONREPOAGENT_ActionType received is of TRITONREPOAGENT_ACTION_LOAD_FAIL

There is a debug message in place if the TRITONREPOAGENT_ActionType received is of TRITONREPOAGENT_ACTION_UNLOAD but, this message is never outputted meaning the repo agent never receives the unload request.

Triton Information Triton version 2.35.0

Custom build, using an OS image that uses JetPack 5.1.1-b56 as the base, with some other changes. CUDA 11.4 is still in use. Backends pulled from tritonserver2.35.0-jetpack5.1.2.tgz directly

To Reproduce

Create a custom repository agent that outputs the type of TRITONREPOAGENT_ModelAction received. Create the .so as described in the steps here,

Place it in agents/checkload/libtritonrepoagent_checkload.so Use TRITONSERVER_ServerOptionsSetRepoAgentDirectory(serverOptions, pathToAgents); Include this in a config of an onnxruntime_onnx or tensorflow_savedmodel model

model_repository_agents
{
  agents [
    {
      name: "checkload",
      parameters {}
    }
  ]
}

Start server and request to load.

Expected behavior

The behavior described above contradicts the expected behavior outline by server/docs/docs/customization_guide/repository_agents.md Here are those steps, with the contradicting behavior in boldface.

Load the model’s configuration file (config.pbtxt) and extract the ModelRepositoryAgents settings. Even if a repository agent modifies the config.pbtxt file, the repository agent settings from the initial config.pbtxt file are used for the entire loading process. For each repository agent specified:

  • Initialize the corresponding repository agent, loading the shared library if necessary. Model loading fails if the shared library is not available or if initialization fails.

  • Invoke the repository agent’s TRITONREPOAGENT_ModelAction function with action TRITONREPOAGENT_ACTION_LOAD. As input the agent can access the model’s repository as either a cloud storage location or a local filesystem location.

  • The repository agent can return success to indicate that no changes where made to the repository, can return failure to indicate that the model load should fail, or can create a new repository for the model (for example, by decrypting the input repository) and return success to indicate that the new repository should be used.

  • If the agent returns success Triton continues to the next agent. If the agent returns failure, Triton skips invocation of any additional agents.

  • If all agents returned success, Triton attempts to load the model using the final model repository.

  • For each repository agent that was invoked with TRITONREPOAGENT_ACTION_LOAD, in reverse order:

    • Triton invokes the repository agent’s TRITONREPOAGENT_ModelAction function with action TRITONREPOAGENT_ACTION_LOAD_COMPLETE if the model loaded successfully or TRITONREPOAGENT_ACTION_LOAD_FAIL if the model failed to load.

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Reactions: 1
  • Comments: 15 (3 by maintainers)

Most upvoted comments

I have been able to reproduce (I believe) - will continue debugging.

I found first_unload in model_lifecycle.h:InvokeAgentModels(), which is always false, resulting in an early return. I modified first_unload with the following code to get the expected result. Before
const bool first_unload = (action_type == TRITONREPOAGENT_ACTION_UNLOAD) && (last_action_type_ != TRITONREPOAGENT_ACTION_UNLOAD);

After const bool first_unload = (action_type != TRITONREPOAGENT_ACTION_UNLOAD) && (last_action_type_ != TRITONREPOAGENT_ACTION_UNLOAD);

After change, it’s working fine.

apologies - let me take a look this week and provide an update -

I’ve done tests using the same OS and device, the issue persists in v.2.27.0, v2.30.0, v2.32.0, however the agent repo runs with proper behavior in v2.20.0 (JP 5.0) and in v2.24.0 (JP (5.0.2)