AMDMIGraphX: GPT2-Large model not running in gfx1100

When running the ORT benchmark on a gfx1100 GPU I received the following message…

root@aus-navi3x-04:/workspace/onnxruntime# python /workspace/onnxruntime/build/Release/onnxruntime/transformers/benchmark.py -g -m gpt2-large --model_class AutoModelForCausalLM  --sequence_length 32 384 --batch_sizes 1 8  --provider=migraphx -p fp16 --disable_gelu --disable_layer_norm --disable_attention --disable_skip_layer_norm --disable_embed_layer_norm --disable_bias_skip_layer_norm --disable_bias_gelu  -d /tmp/results.csv
Arguments: Namespace(models=['gpt2-large'], model_source='pt', model_class='AutoModelForCausalLM', engines=['onnxruntime'], cache_dir='./cache_models', onnx_dir='./onnx_models', use_gpu=True, provider='migraphx', precision=<Precision.FLOAT16: 'fp16'>, verbose=False, overwrite=False, optimizer_info=<OptimizerInfo.BYSCRIPT: 'by_script'>, validate_onnx=False, fusion_csv=None, detail_csv='/tmp/results.csv', result_csv=None, input_counts=[1], test_times=100, batch_sizes=[1, 8], sequence_lengths=[32, 384], disable_ort_io_binding=False, num_threads=[16], force_num_layers=None, disable_attention=True, disable_skip_layer_norm=True, disable_embed_layer_norm=True, disable_bias_skip_layer_norm=True, disable_bias_gelu=True, disable_layer_norm=True, disable_gelu=True, enable_gelu_approximation=False, disable_shape_inference=False, enable_gemm_fast_gelu=False, use_mask_index=False, use_raw_attention_mask=False, no_attention_mask=False, use_multi_head_attention=False, disable_group_norm=False, disable_packed_kv=False, disable_packed_qkv=False, disable_bias_add=False, disable_bias_splitgelu=False, disable_nhwc_conv=False, use_group_norm_channels_first=False)
Model class name: AutoModelForCausalLM
Skip export since model existed: ./onnx_models/gpt2_large_1/gpt2_large_1.onnx
Skip optimization since model existed: ./onnx_models/gpt2_large_1_fp16_gpu/gpt2_large_1_fp16_gpu.onnx
Run onnxruntime on gpt2-large with input shape [1, 32]
Exception
Traceback (most recent call last):
  File "/workspace/onnxruntime/build/Release/onnxruntime/transformers/benchmark.py", line 890, in main
    results += run_onnxruntime(
  File "/workspace/onnxruntime/build/Release/onnxruntime/transformers/benchmark.py", line 289, in run_onnxruntime
    result = inference_ort_with_io_binding(
  File "/workspace/onnxruntime/build/Release/onnxruntime/transformers/benchmark_helper.py", line 355, in inference_ort_with_io_binding
    allocateOutputBuffers(output_buffers, output_buffer_max_sizes, device)
  File "/workspace/onnxruntime/build/Release/onnxruntime/transformers/benchmark_helper.py", line 389, in allocateOutputBuffers
    output_buffers.append(torch.empty(i, dtype=torch.float32, device=device))
torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 590.00 MiB (GPU 0; 44.98 GiB total capacity; 40.26 GiB already allocated; 17179869183.67 GiB free; 40.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF

Need to determine if gfx1100 can run this model. The error message indicates 4GB is free and the alloc is for 0.5GB. I don’t think the message of “17179869183.67 GiB free” is correct

About this issue

Original URL
State: closed
Created 10 months ago
Comments: 19

Most upvoted comments

Had to use

bin/driver perf /code/onnx_models/onnxrt_gpt2_large/onnx_models/gpt2_large_1_fp16_gpu/gpt2_large_1_fp16_gpu.onnx --input-dim @input_dims 1 384 --fill1 --fp16

The command above is wrong. You are not passing the name of the parameter to fill1. It seems like we are silently ignoring the flag when no parameters are passed.

pfultz2 on Oct 20, 2023