onnxruntime
0318be22 - [TRTRTX EP] Support custom-ops used in NVFP4 recipe (#26555)

Commit
48 days ago
[TRTRTX EP] Support custom-ops used in NVFP4 recipe (#26555) ### Description - In this change, enabling FP4 datatype and NVFP4 recipe's custom ops, in TRTRTX EP. ### Motivation and Context - NVIDIA's NVFP4 quantization recipe currently uses custom-ops for operations like FP4 dynamic & double quantization, FP8 Q/DQ in MHA etc. These custom ops are natively supported (i.e. without requiring plugin). - An NVFP4 model (say NVFP4 Flux or SD model) would be able to run through CLI tool like tensorrt_rtx but it will fail on running it through onnxruntime's TRTRTX EP - due to unrecognized custom ops and FP4 datatype. - So, to enable running the NVFP4 model through onnxruntime's TRTRTX EP, we are supporting FP4 datatype and NVFP4 related custom ops in TRTRTX EP. - Validated the change with following settings: SD3.5-medium (with FP4 transformer) + optimum-onnxruntime SD pipeline + Windows 11 22621 + RTX 5090 + text-to-image modality. The inference run did produce image for the text input and no errors were thrown.
Parents
Loading