onnxruntime
1765da17 - QDQ transformations in the OpenVINO EP for the NPU device (#20622)

Commit
360 days ago
QDQ transformations in the OpenVINO EP for the NPU device (#20622) We introduce rulesets that eliminate QDQ nodes of unsupported types and for unsupported quantised operators for the NPU device. This leads to improved performance and accuracy on critical client AI models. Here's a summary of the changes: - Introduces the provider option `enable_qdq_optimizer` which when set to `True` enables stripping of QDQ nodes on the NPU device for models with `QuantizeLinear` and `DequantizeLinear` layers in them. `enable_qdq_optimizer` defaults to `False`. - Always strip out int16/uint16 QDQ layers as these types are not supported by the NPU compiler. - Only supported ops `Conv`, `MatMul`, and `Add` retain QDQ layers around them, specifically identified for optimal inference performance. OpenVINO EP achieves this by iterating through NodeUnits in the QDQ model, and reconstructing the graph only with the required layers. - Added provider APIs to manipulate node units from EP code by @adrianlizarraga - Added capability rule for the Pad operator when it takes DQ layers as input - Fixes from static code analysis tool --------- Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
Author
Parents
  • onnxruntime
    • core
      • providers
        • openvino
          • File
            backend_manager.cc
          • File
            backend_manager.h
          • File
            backend_utils.cc
          • File
            backend_utils.h
          • backends
            • File
              basic_backend.cc
          • File
            contexts.h
          • File
            openvino_execution_provider.cc
          • File
            openvino_execution_provider.h
          • File
            openvino_provider_factory.cc
          • File
            ov_interface.cc
          • File
            ov_interface.h
          • ov_versions
            • File
              capability.cc
            • File
              capability.h
            • File
              data_ops.cc
            • File
              data_ops.h
            • File
              utils.cc
          • qdq_transformations
            • File
              qdq_stripping.cc
            • File
              qdq_stripping.h
        • shared_library
          • File
            provider_api.h
          • File
            provider_interfaces.h
          • File
            provider_wrappedtypes.h
      • session
        • File
          provider_bridge_ort.cc
    • python
      • File
        onnxruntime_pybind_state.cc
    • test/perftest
      • File
        ort_test_session.cc