PR #20622 QDQ transformations in the OpenVINO EP for the NPU device

sspintel1 year ago (edited 1 year ago)

We introduce rulesets that eliminate QDQ nodes of unsupported types and for unsupported quantised operators for the NPU device. This leads to improved performance and accuracy on critical client AI models.

Here's a summary of the changes:

Introduces the provider option enable_qdq_optimizer which when set to True enables stripping of QDQ nodes on the NPU device for models with QuantizeLinear and DequantizeLinear layers in them. enable_qdq_optimizer defaults to False.
Always strip out int16/uint16 QDQ layers as these types are not supported by the NPU compiler.
Only supported ops Conv, MatMul, and Add retain QDQ layers around them, specifically identified for optimal inference performance. OpenVINO EP achieves this by iterating through NodeUnits in the QDQ model, and reconstructing the graph only with the required layers.
Added provider APIs to manipulate node units from EP code by @adrianlizarraga
Added capability rule for the Pad operator when it takes DQ layers as input
Fixes from static code analysis tool

jywu-msft requested a review from

skottmckay 1 year ago

jywu-msft requested a review from

adrianlizarraga 1 year ago

jywu-msft requested a review from

jywu-msft 1 year ago

jywu-msft requested a review from

HectorSVC 1 year ago

github-advanced-security commented on 2024-05-09

onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.h

1

// Copyright (C) Intel Corporation

github-advanced-security1 year ago

CLANGFORMAT/format

See https://clang.llvm.org/docs/ClangFormat.html.
Run lintrunner -a to apply this patch.

Show more details

onnxruntime/core/providers/openvino/qdq_transformations/qdq_stripping.cc

1

// Copyright (C) Intel Corporation

github-advanced-security1 year ago

CLANGFORMAT/format

See https://clang.llvm.org/docs/ClangFormat.html.
Run lintrunner -a to apply this patch.

Show more details

skottmckay commented on 2024-05-10

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

sspintel1 year ago

Thanks for the comments @skottmckay. I will address them. Meanwhile, a question. Is there an easy way to check if a given model is QDQ and only apply this transformation if that's the case (other than iterating through the whole graph and checking if a QDQ operator exists), as we don't want to build the graph from scratch for non-QDQ models.

skottmckay1 year ago

Thanks for the comments @skottmckay. I will address them. Meanwhile, a question. Is there an easy way to check if a given model is QDQ and only apply this transformation if that's the case (other than iterating through the whole graph and checking if a QDQ operator exists), as we don't want to build the graph from scratch for non-QDQ models.

I'm not aware of any direct way.

If it isn't a QDQ model the number of nodes returned by GetAllNodeUnits should be equal to the number of nodes in the GraphViewer so you could potentially infer using that.

But calling GetAllNodeUnits is going to be more expensive than iterating and checking if any node's op_type is DQ/Q as a first step so it depends what you want to optimize for.

sspintel1 year ago

Thanks for the comments @skottmckay. I will address them. Meanwhile, a question. Is there an easy way to check if a given model is QDQ and only apply this transformation if that's the case (other than iterating through the whole graph and checking if a QDQ operator exists), as we don't want to build the graph from scratch for non-QDQ models.

I'm not aware of any direct way.

If it isn't a QDQ model the number of nodes returned by GetAllNodeUnits should be equal to the number of nodes in the GraphViewer so you could potentially infer using that.

But calling GetAllNodeUnits is going to be more expensive than iterating and checking if any node's op_type is DQ/Q as a first step so it depends what you want to optimize for.

I'm iterating and checking if any node's op_type is DQ/Q instead and doing qdq stripping only it it's true

adrianlizarraga commented on 2024-05-13

Conversation is marked as resolved

Show resolved

adrianlizarraga commented on 2024-05-13

Conversation is marked as resolved

Show resolved

adrianlizarraga commented on 2024-05-13

Conversation is marked as resolved

Show resolved

adrianlizarraga commented on 2024-05-13

Conversation is marked as resolved

Show resolved

sspintel marked this pull request as ready for review 1 year ago

sspintel changed the title ~~[Draft] QDQ stripping transformation in OpenVINO EP~~ QDQ stripping transformation in OpenVINO EP 1 year ago

Draft code to remove Q/DQ ops from node units in OpenVINO EP

bfd35c96

remove unnecessary code

787e5128

Rename function, lintrunner

8590e6c9

Add rulesets for Q and DQ removal

47d48f65

Handle cases for unsupported QDQ targets

0e71fb4f

Detect and skip duplicated DQs to dst graph

96cccfb3

Add QDQ stripping to separate files

e059ff30

Fix resource access bug in duplicate DQ removal

49a2b603

Add extended rule sets for each Q and DQ in a NodeUnit

0cd32c47

Remove unreachable code + NPU can take FLOAT for unsupported initiali…

2788b200

Implement a better way to dump stripped models from OVEP

6b160078

Fix rulesets

257b0410

Add OV session option for PTQ model

f3c3bbe0

Enable qdq stripping only for PTQ models

9d78b6c4

Enable is_ptq for python APIs

f378f8ef

Fix to ignore unused initializers from dst graph

e3060ac2

Revert the logic and always keep initializers for nodes that are adde…

b46adeef

Rename flag to enable qdq optimizer; Fix bug in dst graph inputs orde…

4970fffd

Make enable_qdq_optimizer change in contexts.h

cc3dd38e

Enable Q ruleset for standalone Qs & Handle standalone duplicate DQs

09ba1291

Add check for QDQ model; Address PR review comments

e5344c2c

Dump graph name is unknown when input model is serialized

19a6af4c

Fix case of a StandAlone DQ feeding to a supported Op

e833cfe8

Verbose logging of qdq optimizer status and duration

351f74b3

Fix logging of qdq optimizer status

e156246e

Add standalone duplicate DQ DT check

4b9974b3

Fix for Linux build

c8c55cb3

Fix case when Qs have const init inputs

7b4acfa7

FIx review comments

96fc477b

Fix for Pad op with no dimensions

1e920b21

Formatting fix

22ae1a75

Coverty Issues Fixed

980e0bd8

fix coverity issues

d62aaf25

Rewrite Q ruleset for Conv and MatMul

bf99ed28

Fix for node return type in debug mode

ed611659

Exception for dynamic shape models with qdq stripping

2e9bb817

Revert "Rewrite Q ruleset for Conv and MatMul"

2575b586

Fix lint issues

4d3f82ab

sspintel force pushed from 7efebd70 to 4d3f82ab 1 year ago

sspintel changed the title ~~QDQ stripping transformation in OpenVINO EP~~ QDQ transformations in the OpenVINO EP for the NPU device 1 year ago

sfatimar1 year ago

@jywu-msft

jywu-msft1 year ago

/azp run Linux OpenVINO CI Pipeline

azure-pipelines1 year ago

Azure Pipelines successfully started running 1 pipeline(s).

jywu-msft1 year ago

/azp run Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline, Linux Android Emulator QNN CI Pipeline

azure-pipelines1 year ago

Azure Pipelines successfully started running 10 pipeline(s).

Fix cpplint issues

f76fca45

jywu-msft1 year ago

/azp run Linux OpenVINO CI Pipeline

azure-pipelines1 year ago

Azure Pipelines successfully started running 1 pipeline(s).

jywu-msft1 year ago

/azp run Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline, Linux Android Emulator QNN CI Pipeline

azure-pipelines1 year ago

Azure Pipelines successfully started running 10 pipeline(s).

jywu-msft1 year ago

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

azure-pipelines1 year ago

Azure Pipelines successfully started running 10 pipeline(s).

jywu-msft approved these changes on 2024-05-24

jywu-msft merged 1765da17 into main 1 year ago

	25
	26		static ONNX_NAMESPACE::TensorProto_DataType GetZeroPointDT(const Node* qdq_node) {
	27		return static_cast<ONNX_NAMESPACE::TensorProto_DataType>(
	28		qdq_node->InputDefs().at(2)->TypeAsProto()->tensor_type().elem_type());

	41		// Copy the original quantized type proto, but update the type to float.
	42		std::unique_ptr<ONNX_NAMESPACE::TypeProto> type_proto = ONNX_NAMESPACE::TypeProto::Create();
	43		type_proto->copy_from(orig_type_proto);
	44		type_proto->mutable_tensor_type()->set_elem_type(ONNX_NAMESPACE::TensorProto_DataType_FLOAT);

	129		for (Node::NodeConstIterator it_dq = target_node->InputNodesBegin(); it_dq != target_node->InputNodesEnd(); ++it_dq) {
	130		const auto& DQ = &*it_dq;
	131		if (DQ->OpType() != "DequantizeLinear") continue;
	132		is_bias \|= DQ->InputDefs().at(0)->Name().find("bias") != std::string::npos;

	177		}
	178		}
	179
	180		// Used to find if inputs of the target node DQ's are constant initializers

	276		}
	277
	278		// DQs in Double QDQ cases should be kept
	279		if (dq_node->InputDefs().at(2)->Name().find("zero_point_convert") != std::string::npos &&

onnxruntime
QDQ transformations in the OpenVINO EP for the NPU device
#20622

Merged

QDQ transformations in the OpenVINO EP for the NPU device #20622

CLANGFORMAT/format

CLANGFORMAT/format

	359		auto q_zero_point_dt = GetQDQDataType(&q_node);
	360
	361		// Can ignore if this Q is uint16 as it won't be consumed by any supported node
	362		if (q_zero_point_dt != ONNX_NAMESPACE::TensorProto_DataType_UINT16 &&

	379		if (i_q_node.OpType() == "QuantizeLinear" && o_q_node.OpType() == "QuantizeLinear") {
	380		auto dq_zero_point_dt = GetQDQDataType(&dq_node);
	381
	382		if (dq_zero_point_dt != ONNX_NAMESPACE::TensorProto_DataType_UINT16 &&

onnxruntime QDQ transformations in the OpenVINO EP for the NPU device #20622 Merged

QDQ transformations in the OpenVINO EP for the NPU device #20622

CLANGFORMAT/format

CLANGFORMAT/format

onnxruntime
QDQ transformations in the OpenVINO EP for the NPU device
#20622

Merged