Merlin: [BUG] Cannot load a exported deepfm model with NGC 22.03 inference container
run into following errors
I0318 00:00:18.082645 172 hugectr.cc:1926] TRITONBACKEND_ModelInstanceInitialize: deepfm_0 (device 0)
I0318 00:00:18.082694 172 hugectr.cc:1566] Triton Model Instance Initialization on device 0
I0318 00:00:18.082792 172 hugectr.cc:1576] Dense Feature buffer allocation:
I0318 00:00:18.083026 172 hugectr.cc:1583] Categorical Feature buffer allocation:
I0318 00:00:18.083095 172 hugectr.cc:1601] Categorical Row Index buffer allocation:
I0318 00:00:18.083143 172 hugectr.cc:1611] Predict result buffer allocation:
I0318 00:00:18.083203 172 hugectr.cc:1939] ******Loading HugeCTR Model******
I0318 00:00:18.083217 172 hugectr.cc:1631] The model origin json configuration file path is: /ensemble_models/deepfm/1/deepfm.json
[HCTR][00:00:18][INFO][RK0][main]: Global seed is 1305961709
[HCTR][00:00:19][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
[HCTR][00:00:19][INFO][RK0][main]: Start all2all warmup
[HCTR][00:00:19][INFO][RK0][main]: End all2all warmup
[HCTR][00:00:19][INFO][RK0][main]: Create inference session on device: 0
[HCTR][00:00:19][INFO][RK0][main]: Model name: deepfm
[HCTR][00:00:19][INFO][RK0][main]: Use mixed precision: False
[HCTR][00:00:19][INFO][RK0][main]: Use cuda graph: True
[HCTR][00:00:19][INFO][RK0][main]: Max batchsize: 64
[HCTR][00:00:19][INFO][RK0][main]: Use I64 input key: True
[HCTR][00:00:19][INFO][RK0][main]: start create embedding for inference
[HCTR][00:00:19][INFO][RK0][main]: sparse_input name data1
[HCTR][00:00:19][INFO][RK0][main]: create embedding for inference success
[HCTR][00:00:19][INFO][RK0][main]: Inference stage skip BinaryCrossEntropyLoss layer, replaced by Sigmoid layer
I0318 00:00:19.826815 172 hugectr.cc:1639] ******Loading HugeCTR model successfully
I0318 00:00:19.827763 172 model_repository_manager.cc:1149] successfully loaded 'deepfm' version 1
E0318 00:00:19.827767 172 model_repository_manager.cc:1152] failed to load 'deepfm_nvt' version 1: Internal: TypeError: 'NoneType' object is not subscriptable
At:
/ensemble_models/deepfm_nvt/1/model.py(91): _set_output_dtype
/ensemble_models/deepfm_nvt/1/model.py(76): initialize
E0318 00:00:19.827960 172 model_repository_manager.cc:1332] Invalid argument: ensemble 'deepfm_ens' depends on 'deepfm_nvt' which has no loaded version
I0318 00:00:19.828048 172 server.cc:522]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0318 00:00:19.828117 172 server.cc:549]
+---------+---------------------------------------------------------+-----------------------------------------------+
| Backend | Path | Config |
+---------+---------------------------------------------------------+-----------------------------------------------+
| hugectr | /opt/tritonserver/backends/hugectr/libtriton_hugectr.so | {"cmdline":{"ps":"/ensemble_models/ps.json"}} |
+---------+---------------------------------------------------------+-----------------------------------------------+
I0318 00:00:19.828209 172 server.cc:592]
+------------+---------+--------------------------------------------------------------------------+
| Model | Version | Status |
+------------+---------+--------------------------------------------------------------------------+
| deepfm | 1 | READY |
| deepfm_nvt | 1 | UNAVAILABLE: Internal: TypeError: 'NoneType' object is not subscriptable |
| | | |
| | | At: |
| | | /ensemble_models/deepfm_nvt/1/model.py(91): _set_output_dtype |
| | | /ensemble_models/deepfm_nvt/1/model.py(76): initialize |
+------------+---------+--------------------------------------------------------------------------+
I0318 00:00:19.845925 172 metrics.cc:623] Collecting metrics for GPU 0: Tesla T4
I0318 00:00:19.846404 172 tritonserver.cc:1932]
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.19.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_mem |
| | ory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /ensemble_models |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------+
Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-818
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 26 (12 by maintainers)
Commits related to this issue
- Hard-code the `Workflow` output dtypes in Triton Since HugeCTR always expects the same three fields, we don't have to consult the `Workflow`'s output schema to determine the dtypes. We can just hard-... — committed to karlhigley/NVTabular by karlhigley 2 years ago
- Hard-code the `Workflow` output dtypes in Triton (#1468) Since HugeCTR always expects the same three fields, we don't have to consult the `Workflow`'s output schema to determine the dtypes. We can ju... — committed to NVIDIA-Merlin/NVTabular by karlhigley 2 years ago
I tested the Criteo HugeCTR Inference Example and it worked for me
yes please
On Thu, Apr 28, 2022 at 14:53 viswa-nvidia @.***> wrote:
@yingcanw @zehuanw @jconwayNV I’m with @karlhigley on this. We need to move away from manually creating json files as a part of our config.
I’m going to track that part of the issue here and close this one, but I don’t think a Triton ensemble creation process that requires our customers to manually create a config file and place it in the exported Triton model repo directory is very user friendly. 😕
I have shared the exported model (including Triton config) on Slack, the repro is a bit complicated, let me know if you still need it.