optimum: GPT-NeoX quantize error
System Info
optimum==1.8.6
transformers==4.29.2
torch==2.0.1+cu117
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
it is trained GPTNeoXForCausalLM (not code modify) model config like this
{
"_name_or_path": "",
"architectures": [
"GPTNeoXForCausalLM"
],
"bos_token_id": 0,
"classifier_dropout": 0.1,
"eos_token_id": 0,
"hidden_act": "gelu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 20480,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 2048,
"model_type": "gpt_neox",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"num_steps": "global_step301000",
"rotary_emb_base": 10000,
"rotary_pct": 0.5,
"tie_word_embeddings": false,
"transformers_version": "4.29.2",
"use_cache": false,
"use_parallel_residual": true,
"vocab_size": 30080
}
and export onnx model to optimum-cli
optimum-cli export onnx --task text-generation --atol 1e-4 --model /workspace/model_path /workspace/onnx_default --framework pt
and then, do quantize to optimum-cli
optimum-cli onnxruntime quantize --onnx_model /workspace/onnx_default -o /workspace/onnx_default/quantize --tensorrt
then error occured
Creating static quantizer: QDQ (mode: QLinearOps, schema: s8/s8, channel-wise: False)
Quantizing model...
Traceback (most recent call last):
File "/usr/local/bin/optimum-cli", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/optimum/commands/optimum_cli.py", line 163, in main
service.run()
File "/usr/local/lib/python3.10/dist-packages/optimum/commands/onnxruntime/quantize.py", line 102, in run
q.quantize(save_dir=save_dir, quantization_config=qconfig)
File "/usr/local/lib/python3.10/dist-packages/optimum/onnxruntime/quantization.py", line 409, in quantize
quantizer.quantize_model()
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/quantization/qdq_quantizer.py", line 217, in quantize_model
self._quantize_normal_tensors()
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/quantization/qdq_quantizer.py", line 385, in _quantize_normal_tensors
raise ValueError(
ValueError: Quantization parameters are not specified for param /gpt_neox/layers.0/input_layernorm/ReduceMean_1_output_0. In static mode quantization params for inputs and outputs of nodes to be quantized are required.
Expected behavior
quantize success
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (2 by maintainers)
Commits related to this issue
- Update: #1090 code change — committed to YooSungHyun/optimum by YooSungHyun a year ago
--avx512
is done well.it just onnx model size error, and maybe given
--use_external_data_format=True
, work fine i think