onnxruntime
1765da17 - QDQ transformations in the OpenVINO EP for the NPU device (#20622)

Commit

360 days ago

QDQ transformations in the OpenVINO EP for the NPU device (#20622) We introduce rulesets that eliminate QDQ nodes of unsupported types and for unsupported quantised operators for the NPU device. This leads to improved performance and accuracy on critical client AI models. Here's a summary of the changes: - Introduces the provider option `enable_qdq_optimizer` which when set to `True` enables stripping of QDQ nodes on the NPU device for models with `QuantizeLinear` and `DequantizeLinear` layers in them. `enable_qdq_optimizer` defaults to `False`. - Always strip out int16/uint16 QDQ layers as these types are not supported by the NPU compiler. - Only supported ops `Conv`, `MatMul`, and `Add` retain QDQ layers around them, specifically identified for optimal inference performance. OpenVINO EP achieves this by iterating through NodeUnits in the QDQ model, and reconstructing the graph only with the required layers. - Added provider APIs to manipulate node units from EP code by @adrianlizarraga - Added capability rule for the Pad operator when it takes DQ layers as input - Fixes from static code analysis tool --------- Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>

References

#20622 - QDQ transformations in the OpenVINO EP for the NPU device

Author

sspintel

Parents

ed827588

Files24

onnxruntime
- core
  - providers
    - openvino
      - backend_manager.cc
      - backend_manager.h
      - backend_utils.cc
      - backend_utils.h
      - backends
        basic_backend.cc
      - contexts.h
      - openvino_execution_provider.cc
      - openvino_execution_provider.h
      - openvino_provider_factory.cc
      - ov_interface.cc
      - ov_interface.h
      - ov_versions
        capability.cc
        capability.h
        data_ops.cc
        data_ops.h
        utils.cc
      - qdq_transformations
        qdq_stripping.cc
        qdq_stripping.h
    - shared_library
      - provider_api.h
      - provider_interfaces.h
      - provider_wrappedtypes.h
  - session
    - provider_bridge_ort.cc
- python
  - onnxruntime_pybind_state.cc
- test/perftest
  - ort_test_session.cc

onnxruntime 1765da17 - QDQ transformations in the OpenVINO EP for the NPU device (#20622)

onnxruntime
1765da17 - QDQ transformations in the OpenVINO EP for the NPU device (#20622)