onnxruntime
QDQ transformations in the OpenVINO EP for the NPU device
#20622
Merged

QDQ transformations in the OpenVINO EP for the NPU device #20622

sspintel
sspintel1 year ago (edited 1 year ago)

We introduce rulesets that eliminate QDQ nodes of unsupported types and for unsupported quantised operators for the NPU device. This leads to improved performance and accuracy on critical client AI models.

Here's a summary of the changes:

  • Introduces the provider option enable_qdq_optimizer which when set to True enables stripping of QDQ nodes on the NPU device for models with QuantizeLinear and DequantizeLinear layers in them. enable_qdq_optimizer defaults to False.
  • Always strip out int16/uint16 QDQ layers as these types are not supported by the NPU compiler.
  • Only supported ops Conv, MatMul, and Add retain QDQ layers around them, specifically identified for optimal inference performance. OpenVINO EP achieves this by iterating through NodeUnits in the QDQ model, and reconstructing the graph only with the required layers.
  • Added provider APIs to manipulate node units from EP code by @adrianlizarraga
  • Added capability rule for the Pad operator when it takes DQ layers as input
  • Fixes from static code analysis tool
jywu-msft jywu-msft requested a review from skottmckay skottmckay 1 year ago
jywu-msft jywu-msft requested a review from adrianlizarraga adrianlizarraga 1 year ago
jywu-msft jywu-msft requested a review from jywu-msft jywu-msft 1 year ago
jywu-msft jywu-msft requested a review from HectorSVC HectorSVC 1 year ago
github-advanced-security
github-advanced-security commented on 2024-05-09
onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.h
1
// Copyright (C) Intel Corporation
github-advanced-security1 year ago

CLANGFORMAT/format

See https://clang.llvm.org/docs/ClangFormat.html.
Run lintrunner -a to apply this patch.

Show more details

onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc
1
// Copyright (C) Intel Corporation
github-advanced-security1 year ago

CLANGFORMAT/format

See https://clang.llvm.org/docs/ClangFormat.html.
Run lintrunner -a to apply this patch.

Show more details

skottmckay
skottmckay commented on 2024-05-10
Conversation is marked as resolved
Show resolved
onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc
25
26static ONNX_NAMESPACE::TensorProto_DataType GetZeroPointDT(const Node* qdq_node) {
27 return static_cast<ONNX_NAMESPACE::TensorProto_DataType>(
28
qdq_node->InputDefs().at(2)->TypeAsProto()->tensor_type().elem_type());
skottmckay1 year ago

Optional inputs aren't guaranteed to exist, so it would be safer to check that InputDefs().size() > 2 before getting the input def at that slot.

But given the zp input may not exist, you may need to check the output type of a Q node and the input 0 type of a DQ instead of relying on the zp input for that type info.

sspintel1 year ago

Fixed. Now returning output type of Q and input type of DQ.

Conversation is marked as resolved
Show resolved
onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc
skottmckay1 year ago

nit: a TypeProto on the stack could be used with GetOrCreateNodeArg being called with the address of that.

sspintel1 year ago

Can you please elaborate on this?

skottmckay1 year ago

Sorry - forgot you're using an external library for the EP so you can't create protobuf types directly.

Conversation is marked as resolved
Show resolved
onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc
41 // Copy the original quantized type proto, but update the type to float.
42 std::unique_ptr<ONNX_NAMESPACE::TypeProto> type_proto = ONNX_NAMESPACE::TypeProto::Create();
43 type_proto->copy_from(orig_type_proto);
44
type_proto->mutable_tensor_type()->set_elem_type(ONNX_NAMESPACE::TensorProto_DataType_FLOAT);
skottmckay1 year ago

Is it valid to hardcode float given the ONNX spec allows for float/float16/bfloat16? Maybe this code would never be hit for the 16-bit float types.

Conversation is marked as resolved
Show resolved
onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc
129 for (Node::NodeConstIterator it_dq = target_node->InputNodesBegin(); it_dq != target_node->InputNodesEnd(); ++it_dq) {
130 const auto& DQ = &*it_dq;
131 if (DQ->OpType() != "DequantizeLinear") continue;
132
is_bias |= DQ->InputDefs().at(0)->Name().find("bias") != std::string::npos;
skottmckay1 year ago

Looking for an input called 'bias' seems very model specific. Is there some other component that ensures there's a single 'bias' value for the graph and it has that exact name?

Naively I would have expected this check to come from say a Conv node using the bias input's value name from the node instead of a hardcoded name of "bias".

sspintel1 year ago

The rulesets changed to not use bias anymore, so I've removed the functions which were using it earlier.

Conversation is marked as resolved
Show resolved
onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc
177 }
178}
179
180
// Used to find if inputs of the target node DQ's are constant initializers
skottmckay1 year ago

nit: the scale and zp inputs of a DQ are generally constant initializers so it might be good to clarify the comment to say the check is for input 0 of the DQ being a constant initializer.

sspintel1 year ago

Fixed.

Conversation is marked as resolved
Show resolved
onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc
276 }
277
278 // DQs in Double QDQ cases should be kept
279
if (dq_node->InputDefs().at(2)->Name().find("zero_point_convert") != std::string::npos &&
skottmckay1 year ago

need to check the optional input exists

sspintel1 year ago

Now, checking for scale_convert instead of zero_point_convert.

Conversation is marked as resolved
Show resolved
onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc
skottmckay1 year ago👍 1

@adrianlizarraga NodeUnit should provide OutputEdgeCount() so this is unnecessary

sspintel
sspintel1 year ago

Thanks for the comments @skottmckay. I will address them. Meanwhile, a question. Is there an easy way to check if a given model is QDQ and only apply this transformation if that's the case (other than iterating through the whole graph and checking if a QDQ operator exists), as we don't want to build the graph from scratch for non-QDQ models.

skottmckay
skottmckay1 year ago

Thanks for the comments @skottmckay. I will address them. Meanwhile, a question. Is there an easy way to check if a given model is QDQ and only apply this transformation if that's the case (other than iterating through the whole graph and checking if a QDQ operator exists), as we don't want to build the graph from scratch for non-QDQ models.

I'm not aware of any direct way.

If it isn't a QDQ model the number of nodes returned by GetAllNodeUnits should be equal to the number of nodes in the GraphViewer so you could potentially infer using that.

But calling GetAllNodeUnits is going to be more expensive than iterating and checking if any node's op_type is DQ/Q as a first step so it depends what you want to optimize for.

sspintel
sspintel1 year ago

Thanks for the comments @skottmckay. I will address them. Meanwhile, a question. Is there an easy way to check if a given model is QDQ and only apply this transformation if that's the case (other than iterating through the whole graph and checking if a QDQ operator exists), as we don't want to build the graph from scratch for non-QDQ models.

I'm not aware of any direct way.

If it isn't a QDQ model the number of nodes returned by GetAllNodeUnits should be equal to the number of nodes in the GraphViewer so you could potentially infer using that.

But calling GetAllNodeUnits is going to be more expensive than iterating and checking if any node's op_type is DQ/Q as a first step so it depends what you want to optimize for.

I'm iterating and checking if any node's op_type is DQ/Q instead and doing qdq stripping only it it's true

adrianlizarraga
adrianlizarraga commented on 2024-05-13
Conversation is marked as resolved
Show resolved
onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc
53 // Copy the original quantized type proto, but update the type to float.
54 std::unique_ptr<ONNX_NAMESPACE::TypeProto> type_proto = ONNX_NAMESPACE::TypeProto::Create();
55 type_proto->copy_from(orig_type_proto);
56
type_proto->mutable_tensor_type()->set_elem_type(ONNX_NAMESPACE::TensorProto_DataType_FLOAT);
adrianlizarraga1 year ago

You could get the actual float type from the type of the scale in the Q (or DQ) op.

One way to do this is to get scale initializer's name -> get initializer TensorProto -> get the float type

const auto& src_initializers = src_graph.GetAllInitizedTensors();
const std::string& scale_initializer_name = io_def.quant_param->scale.Name();
auto tensor_proto_iter = src_initializers.find(scale_initializer_name);

// Should check that it exists. Maybe change signature of this function to return a Status.
// ORT_RETURN_IF(tensor_proto_iter == src_initializers.end(), "Unable to find scale initializer ", scale_initializer_name);

const ONNX_NAMESPACE::TensorProto* scale_tensor_proto = tensor_proto_iter->second;
int32_t float_type = scale_tensor_proto->data_type();
sspintel1 year ago

Fixed

adrianlizarraga
adrianlizarraga commented on 2024-05-13
Conversation is marked as resolved
Show resolved
onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc
359 auto q_zero_point_dt = GetQDQDataType(&q_node);
360
361 // Can ignore if this Q is uint16 as it won't be consumed by any supported node
362
if (q_zero_point_dt != ONNX_NAMESPACE::TensorProto_DataType_UINT16 &&
adrianlizarraga1 year ago (edited 1 year ago)

Looks like this should check for both UINT16 and INT16 here and in other places.

sspintel1 year ago

Fixed

adrianlizarraga
adrianlizarraga commented on 2024-05-13
Conversation is marked as resolved
Show resolved
onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc
379 if (i_q_node.OpType() == "QuantizeLinear" && o_q_node.OpType() == "QuantizeLinear") {
380 auto dq_zero_point_dt = GetQDQDataType(&dq_node);
381
382
if (dq_zero_point_dt != ONNX_NAMESPACE::TensorProto_DataType_UINT16 &&
adrianlizarraga1 year ago

The comment above says "Keep int8 DQ/Qs in int16 -> int8", but if I'm not mistaken, this condition does the opposite. A int16 -> int8 conversion starts with a DequantizeLinear w/ int16 zp.

sspintel1 year ago

Fixed comment to clarify what is done

adrianlizarraga
adrianlizarraga commented on 2024-05-13
Conversation is marked as resolved
Show resolved
onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc
adrianlizarraga1 year ago

This seems to be relying on an implementation detail of the EnsureUniqueDQForNodeUnit optimizer where duplicate DQ nodes get the name {original_dq_node_name}/duplicated. Is there a way to avoid this or make it more robust? Are you encountering models in which a duplicated DQ is a NodeUnit::Type::SingleNode instead of a NodeUnit::Type::QDQGroup?

sspintel1 year ago

As of now, I can't think of a better way to detect and reverse duplicate DQs. And yes, some duplicate DQs occur as SingleNodes in the graph and that's why this is required.

jywu-msft1 year ago

let's revisit this.

sspintel sspintel marked this pull request as ready for review 1 year ago
sspintel sspintel changed the title [Draft] QDQ stripping transformation in OpenVINO EP QDQ stripping transformation in OpenVINO EP 1 year ago
adrianlizarraga Draft code to remove Q/DQ ops from node units in OpenVINO EP
bfd35c96
adrianlizarraga remove unnecessary code
787e5128
adrianlizarraga Rename function, lintrunner
8590e6c9
sspintel Add rulesets for Q and DQ removal
47d48f65
sspintel Handle cases for unsupported QDQ targets
0e71fb4f
sspintel Detect and skip duplicated DQs to dst graph
96cccfb3
sspintel Add QDQ stripping to separate files
e059ff30
sspintel Fix resource access bug in duplicate DQ removal
49a2b603
sspintel Add extended rule sets for each Q and DQ in a NodeUnit
0cd32c47
sspintel Remove unreachable code + NPU can take FLOAT for unsupported initiali…
2788b200
sspintel Implement a better way to dump stripped models from OVEP
6b160078
sspintel Fix rulesets
257b0410
preetha-intel Add OV session option for PTQ model
f3c3bbe0
preetha-intel Enable qdq stripping only for PTQ models
9d78b6c4
sspintel Enable is_ptq for python APIs
f378f8ef
sspintel Fix to ignore unused initializers from dst graph
e3060ac2
sspintel Revert the logic and always keep initializers for nodes that are adde…
b46adeef
sspintel Rename flag to enable qdq optimizer; Fix bug in dst graph inputs orde…
4970fffd
sspintel Make enable_qdq_optimizer change in contexts.h
cc3dd38e
sspintel Enable Q ruleset for standalone Qs & Handle standalone duplicate DQs
09ba1291
sspintel Add check for QDQ model; Address PR review comments
e5344c2c
sspintel Dump graph name is unknown when input model is serialized
19a6af4c
sspintel Fix case of a StandAlone DQ feeding to a supported Op
e833cfe8
sspintel Verbose logging of qdq optimizer status and duration
351f74b3
sspintel Fix logging of qdq optimizer status
e156246e
sspintel Add standalone duplicate DQ DT check
4b9974b3
sspintel Fix for Linux build
c8c55cb3
sspintel Fix case when Qs have const init inputs
7b4acfa7
sspintel FIx review comments
96fc477b
sspintel Fix for Pad op with no dimensions
1e920b21
sspintel Formatting fix
22ae1a75
sfatimar Coverty Issues Fixed
980e0bd8
saurabhkale17 fix coverity issues
d62aaf25
sspintel Rewrite Q ruleset for Conv and MatMul
bf99ed28
sspintel Fix for node return type in debug mode
ed611659
sspintel Exception for dynamic shape models with qdq stripping
2e9bb817
sspintel Revert "Rewrite Q ruleset for Conv and MatMul"
2575b586
sspintel Fix lint issues
4d3f82ab
sspintel sspintel force pushed from 7efebd70 to 4d3f82ab 1 year ago
sspintel sspintel changed the title QDQ stripping transformation in OpenVINO EP QDQ transformations in the OpenVINO EP for the NPU device 1 year ago
sfatimar
sfatimar1 year ago
jywu-msft
jywu-msft1 year ago

/azp run Linux OpenVINO CI Pipeline

azure-pipelines
azure-pipelines1 year ago
Azure Pipelines successfully started running 1 pipeline(s).
jywu-msft
jywu-msft1 year ago

/azp run Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline, Linux Android Emulator QNN CI Pipeline

azure-pipelines
azure-pipelines1 year ago
Azure Pipelines successfully started running 10 pipeline(s).
sspintel Fix cpplint issues
f76fca45
jywu-msft
jywu-msft1 year ago

/azp run Linux OpenVINO CI Pipeline

azure-pipelines
azure-pipelines1 year ago
Azure Pipelines successfully started running 1 pipeline(s).
jywu-msft
jywu-msft1 year ago

/azp run Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline, Linux Android Emulator QNN CI Pipeline

azure-pipelines
azure-pipelines1 year ago
Azure Pipelines successfully started running 10 pipeline(s).
jywu-msft
jywu-msft1 year ago

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

azure-pipelines
azure-pipelines1 year ago
Azure Pipelines successfully started running 10 pipeline(s).
jywu-msft
jywu-msft approved these changes on 2024-05-24
jywu-msft jywu-msft merged 1765da17 into main 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone