AMDMIGraphX: GPT2-Large model not running in gfx1100
When running the ORT benchmark on a gfx1100 GPU I received the following message…
root@aus-navi3x-04:/workspace/onnxruntime# python /workspace/onnxruntime/build/Release/onnxruntime/transformers/benchmark.py -g -m gpt2-large --model_class AutoModelForCausalLM --sequence_length 32 384 --batch_sizes 1 8 --provider=migraphx -p fp16 --disable_gelu --disable_layer_norm --disable_attention --disable_skip_layer_norm --disable_embed_layer_norm --disable_bias_skip_layer_norm --disable_bias_gelu -d /tmp/results.csv
Arguments: Namespace(models=['gpt2-large'], model_source='pt', model_class='AutoModelForCausalLM', engines=['onnxruntime'], cache_dir='./cache_models', onnx_dir='./onnx_models', use_gpu=True, provider='migraphx', precision=<Precision.FLOAT16: 'fp16'>, verbose=False, overwrite=False, optimizer_info=<OptimizerInfo.BYSCRIPT: 'by_script'>, validate_onnx=False, fusion_csv=None, detail_csv='/tmp/results.csv', result_csv=None, input_counts=[1], test_times=100, batch_sizes=[1, 8], sequence_lengths=[32, 384], disable_ort_io_binding=False, num_threads=[16], force_num_layers=None, disable_attention=True, disable_skip_layer_norm=True, disable_embed_layer_norm=True, disable_bias_skip_layer_norm=True, disable_bias_gelu=True, disable_layer_norm=True, disable_gelu=True, enable_gelu_approximation=False, disable_shape_inference=False, enable_gemm_fast_gelu=False, use_mask_index=False, use_raw_attention_mask=False, no_attention_mask=False, use_multi_head_attention=False, disable_group_norm=False, disable_packed_kv=False, disable_packed_qkv=False, disable_bias_add=False, disable_bias_splitgelu=False, disable_nhwc_conv=False, use_group_norm_channels_first=False)
Model class name: AutoModelForCausalLM
Skip export since model existed: ./onnx_models/gpt2_large_1/gpt2_large_1.onnx
Skip optimization since model existed: ./onnx_models/gpt2_large_1_fp16_gpu/gpt2_large_1_fp16_gpu.onnx
Run onnxruntime on gpt2-large with input shape [1, 32]
Exception
Traceback (most recent call last):
File "/workspace/onnxruntime/build/Release/onnxruntime/transformers/benchmark.py", line 890, in main
results += run_onnxruntime(
File "/workspace/onnxruntime/build/Release/onnxruntime/transformers/benchmark.py", line 289, in run_onnxruntime
result = inference_ort_with_io_binding(
File "/workspace/onnxruntime/build/Release/onnxruntime/transformers/benchmark_helper.py", line 355, in inference_ort_with_io_binding
allocateOutputBuffers(output_buffers, output_buffer_max_sizes, device)
File "/workspace/onnxruntime/build/Release/onnxruntime/transformers/benchmark_helper.py", line 389, in allocateOutputBuffers
output_buffers.append(torch.empty(i, dtype=torch.float32, device=device))
torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 590.00 MiB (GPU 0; 44.98 GiB total capacity; 40.26 GiB already allocated; 17179869183.67 GiB free; 40.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF
Need to determine if gfx1100 can run this model. The error message indicates 4GB is free and the alloc is for 0.5GB. I don’t think the message of “17179869183.67 GiB free” is correct
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 19
The command above is wrong. You are not passing the name of the parameter to
fill1. It seems like we are silently ignoring the flag when no parameters are passed.