onnxruntime
109a5f5b - Fix large model check in Intel's Neural Compressor (#27746)

Commit
72 days ago
Fix large model check in Intel's Neural Compressor (#27746) ### Description This PR updates the logic for identifying a large model in Intel's Neural Compressor. ### Motivation and Context The original logic was not sufficient to detect whether a model produced by the model builder is too large or not. Here is an example traceback from an internal customer. ``` Traceback (most recent call last): File "D:\a\_work\1\s\edge.onnxruntime-genai\src\python\py\models\builder.py", line 502, in <module> create_model( File "C:\ToolCache\Python\3.12.10\x64\Lib\site-packages\torch\utils\_contextlib.py", line 124, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\a\_work\1\s\edge.onnxruntime-genai\src\python\py\models\builder.py", line 346, in create_model onnx_model.save_model(output_dir) File "D:\a\_work\1\s\edge.onnxruntime-genai\src\python\py\models\builders\base.py", line 748, in save_model model = self.to_int4() ^^^^^^^^^^^^^^ File "D:\a\_work\1\s\edge.onnxruntime-genai\src\python\py\models\builders\base.py", line 738, in to_int4 quant.process() File "C:\ToolCache\Python\3.12.10\x64\Lib\site-packages\onnxruntime\quantization\matmul_nbits_quantizer.py", line 1442, in process self.int4_quant_algo() File "C:\ToolCache\Python\3.12.10\x64\Lib\site-packages\onnxruntime\quantization\matmul_nbits_quantizer.py", line 1388, in int4_quant_algo self.model = rtn_quantize( ^^^^^^^^^^^^^ File "C:\ToolCache\Python\3.12.10\x64\Lib\site-packages\onnxruntime\quantization\neural_compressor\weight_only.py", line 456, in rtn_quantize model = ONNXModel(model) ^^^^^^^^^^^^^^^^ File "C:\ToolCache\Python\3.12.10\x64\Lib\site-packages\onnxruntime\quantization\neural_compressor\onnx_model.py", line 52, in __init__ self.check_is_large_model() File "C:\ToolCache\Python\3.12.10\x64\Lib\site-packages\onnxruntime\quantization\neural_compressor\onnx_model.py", line 91, in check_is_large_model raise e File "C:\ToolCache\Python\3.12.10\x64\Lib\site-packages\onnxruntime\quantization\neural_compressor\onnx_model.py", line 84, in check_is_large_model init_bytes = init.SerializeToString() ^^^^^^^^^^^^^^^^^^^^^^^^ google.protobuf.message.EncodeError: Failed to serialize proto ```
Parents
Loading