1 | // Copyright (C) Intel Corporation |
See https://clang.llvm.org/docs/ClangFormat.html.
Run lintrunner -a
to apply this patch.
1 | // Copyright (C) Intel Corporation |
See https://clang.llvm.org/docs/ClangFormat.html.
Run lintrunner -a
to apply this patch.
Thanks for the comments @skottmckay. I will address them. Meanwhile, a question. Is there an easy way to check if a given model is QDQ and only apply this transformation if that's the case (other than iterating through the whole graph and checking if a QDQ operator exists), as we don't want to build the graph from scratch for non-QDQ models.
Thanks for the comments @skottmckay. I will address them. Meanwhile, a question. Is there an easy way to check if a given model is QDQ and only apply this transformation if that's the case (other than iterating through the whole graph and checking if a QDQ operator exists), as we don't want to build the graph from scratch for non-QDQ models.
I'm not aware of any direct way.
If it isn't a QDQ model the number of nodes returned by GetAllNodeUnits should be equal to the number of nodes in the GraphViewer so you could potentially infer using that.
But calling GetAllNodeUnits is going to be more expensive than iterating and checking if any node's op_type is DQ/Q as a first step so it depends what you want to optimize for.
Thanks for the comments @skottmckay. I will address them. Meanwhile, a question. Is there an easy way to check if a given model is QDQ and only apply this transformation if that's the case (other than iterating through the whole graph and checking if a QDQ operator exists), as we don't want to build the graph from scratch for non-QDQ models.
I'm not aware of any direct way.
If it isn't a QDQ model the number of nodes returned by GetAllNodeUnits should be equal to the number of nodes in the GraphViewer so you could potentially infer using that.
But calling GetAllNodeUnits is going to be more expensive than iterating and checking if any node's op_type is DQ/Q as a first step so it depends what you want to optimize for.
I'm iterating and checking if any node's op_type is DQ/Q instead and doing qdq stripping only it it's true
/azp run Linux OpenVINO CI Pipeline
/azp run Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline, Linux Android Emulator QNN CI Pipeline
/azp run Linux OpenVINO CI Pipeline
/azp run Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline, Linux Android Emulator QNN CI Pipeline
/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline
Login to write a write a comment.
We introduce rulesets that eliminate QDQ nodes of unsupported types and for unsupported quantised operators for the NPU device. This leads to improved performance and accuracy on critical client AI models.
Here's a summary of the changes:
enable_qdq_optimizer
which when set toTrue
enables stripping of QDQ nodes on the NPU device for models withQuantizeLinear
andDequantizeLinear
layers in them.enable_qdq_optimizer
defaults toFalse
.Conv
,MatMul
, andAdd
retain QDQ layers around them, specifically identified for optimal inference performance. OpenVINO EP achieves this by iterating through NodeUnits in the QDQ model, and reconstructing the graph only with the required layers.