Merlin: [BUG] Cannot load a exported deepfm model with NGC 22.03 inference container

run into following errors

I0318 00:00:18.082645 172 hugectr.cc:1926] TRITONBACKEND_ModelInstanceInitialize: deepfm_0 (device 0)
I0318 00:00:18.082694 172 hugectr.cc:1566] Triton Model Instance Initialization on device 0
I0318 00:00:18.082792 172 hugectr.cc:1576] Dense Feature buffer allocation:
I0318 00:00:18.083026 172 hugectr.cc:1583] Categorical Feature buffer allocation:
I0318 00:00:18.083095 172 hugectr.cc:1601] Categorical Row Index buffer allocation:
I0318 00:00:18.083143 172 hugectr.cc:1611] Predict result buffer allocation:
I0318 00:00:18.083203 172 hugectr.cc:1939] ******Loading HugeCTR Model******
I0318 00:00:18.083217 172 hugectr.cc:1631] The model origin json configuration file path is: /ensemble_models/deepfm/1/deepfm.json
[HCTR][00:00:18][INFO][RK0][main]: Global seed is 1305961709
[HCTR][00:00:19][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
[HCTR][00:00:19][INFO][RK0][main]: Start all2all warmup
[HCTR][00:00:19][INFO][RK0][main]: End all2all warmup
[HCTR][00:00:19][INFO][RK0][main]: Create inference session on device: 0
[HCTR][00:00:19][INFO][RK0][main]: Model name: deepfm
[HCTR][00:00:19][INFO][RK0][main]: Use mixed precision: False
[HCTR][00:00:19][INFO][RK0][main]: Use cuda graph: True
[HCTR][00:00:19][INFO][RK0][main]: Max batchsize: 64
[HCTR][00:00:19][INFO][RK0][main]: Use I64 input key: True
[HCTR][00:00:19][INFO][RK0][main]: start create embedding for inference
[HCTR][00:00:19][INFO][RK0][main]: sparse_input name data1
[HCTR][00:00:19][INFO][RK0][main]: create embedding for inference success
[HCTR][00:00:19][INFO][RK0][main]: Inference stage skip BinaryCrossEntropyLoss layer, replaced by Sigmoid layer
I0318 00:00:19.826815 172 hugectr.cc:1639] ******Loading HugeCTR model successfully
I0318 00:00:19.827763 172 model_repository_manager.cc:1149] successfully loaded 'deepfm' version 1
E0318 00:00:19.827767 172 model_repository_manager.cc:1152] failed to load 'deepfm_nvt' version 1: Internal: TypeError: 'NoneType' object is not subscriptable

At:
  /ensemble_models/deepfm_nvt/1/model.py(91): _set_output_dtype
  /ensemble_models/deepfm_nvt/1/model.py(76): initialize

E0318 00:00:19.827960 172 model_repository_manager.cc:1332] Invalid argument: ensemble 'deepfm_ens' depends on 'deepfm_nvt' which has no loaded version
I0318 00:00:19.828048 172 server.cc:522]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0318 00:00:19.828117 172 server.cc:549]
+---------+---------------------------------------------------------+-----------------------------------------------+
| Backend | Path                                                    | Config                                        |
+---------+---------------------------------------------------------+-----------------------------------------------+
| hugectr | /opt/tritonserver/backends/hugectr/libtriton_hugectr.so | {"cmdline":{"ps":"/ensemble_models/ps.json"}} |
+---------+---------------------------------------------------------+-----------------------------------------------+

I0318 00:00:19.828209 172 server.cc:592]
+------------+---------+--------------------------------------------------------------------------+
| Model      | Version | Status                                                                   |
+------------+---------+--------------------------------------------------------------------------+
| deepfm     | 1       | READY                                                                    |
| deepfm_nvt | 1       | UNAVAILABLE: Internal: TypeError: 'NoneType' object is not subscriptable |
|            |         |                                                                          |
|            |         | At:                                                                      |
|            |         |   /ensemble_models/deepfm_nvt/1/model.py(91): _set_output_dtype          |
|            |         |   /ensemble_models/deepfm_nvt/1/model.py(76): initialize                 |
+------------+---------+--------------------------------------------------------------------------+

I0318 00:00:19.845925 172 metrics.cc:623] Collecting metrics for GPU 0: Tesla T4
I0318 00:00:19.846404 172 tritonserver.cc:1932]
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                              |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                             |
| server_version                   | 2.19.0                                                                                                                             |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_mem |
|                                  | ory cuda_shared_memory binary_tensor_data statistics trace                                                                         |
| model_repository_path[0]         | /ensemble_models                                                                                                                   |
| model_control_mode               | MODE_NONE                                                                                                                          |
| strict_model_config              | 1                                                                                                                                  |
| rate_limit                       | OFF                                                                                                                                |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                          |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                           |
| response_cache_byte_size         | 0                                                                                                                                  |
| min_supported_compute_capability | 6.0                                                                                                                                |
| strict_readiness                 | 1                                                                                                                                  |
| exit_timeout                     | 30                                                                                                                                 |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------+

Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-818

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 26 (12 by maintainers)

Commits related to this issue

Hard-code the `Workflow` output dtypes in Triton Since HugeCTR always expects the same three fields, we don't have to consult the `Workflow`'s output schema to determine the dtypes. We can just hard-... — committed to karlhigley/NVTabular by karlhigley 2 years ago
Hard-code the `Workflow` output dtypes in Triton (#1468) Since HugeCTR always expects the same three fields, we don't have to consult the `Workflow`'s output schema to determine the dtypes. We can ju... — committed to NVIDIA-Merlin/NVTabular by karlhigley 2 years ago

Most upvoted comments

I tested the Criteo HugeCTR Inference Example and it worked for me

bschifferer on Apr 22, 2022

yes please

On Thu, Apr 28, 2022 at 14:53 viswa-nvidia @.***> wrote:

@mengdong https://github.com/mengdong , can we close this issue ? @sohn21c https://github.com/sohn21c for viz.

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA-Merlin/Merlin/issues/125#issuecomment-1112688473, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLQN42LNNQ2LZDN55F5GXLVHMCHLANCNFSM5PSPJBTA . You are receiving this because you were mentioned.Message ID: @.***>

mengdong on Apr 28, 2022

@yingcanw @zehuanw @jconwayNV I’m with @karlhigley on this. We need to move away from manually creating json files as a part of our config.

EvenOldridge on Mar 25, 2022

I’m going to track that part of the issue here and close this one, but I don’t think a Triton ensemble creation process that requires our customers to manually create a config file and place it in the exported Triton model repo directory is very user friendly. 😕

karlhigley on Mar 25, 2022

I have shared the exported model (including Triton config) on Slack, the repro is a bit complicated, let me know if you still need it.

mengdong on Mar 18, 2022