PR #21171 [QNN EP] Initial INT4 support

[QNN EP] Initial INT4 support #21171

adrianlizarraga merged 61 commits into main from adrianl/qnn-int4

Update include/framework/ with int4

40da6791

Update onnxruntime_c_api.h with int4 type

e3e8a6bd

Update cpu_contrib_kernels.cc with int4 Q/DQ

5e01e0f6

Update framework/data_types.cc with int4 types

44e0e023

Update onnxruntime map type info with int4 types

ce03eb22

Update Tensor methods to calc int4 tensor data sizze

e1590078

Update function to map tensor_proto int4 to onnxruntime enum

46c3d0d6

Update tensorprotoutils to handle int4 protobufs

583dae1e

Add functions to map Int4x2 to an onnxruntime tensor type enum

0009a47c

Update com.microsoft.DequantizeLinear schema to support int4 types fo…

d11a3d49

Add option to disable int4 type in Conv and MatMul qdq node group sel…

208c4037

Add DequantizeLinear with int4 support (missing block quant)

f91ae697

update transpose helper to support int4

7323793a

Update provider bridge with int4 apis

8c79905a

Update quantizer tool with int4

fc695caa

Remove duplicate enum

eeacb78b

Remove MatMulSelector constructor arg

6f9da045

Remove unnecessary explicit template instantiation

7e8c4588

Add static_cast

3b7ed5f3

Add temporary CPU EP Int4 test (qdq conv)

c7086a5e

Update operator docs

34dfa171

Update testing version of tensorprotoutils with int4 helpers

e7bec9cf

Run lintrunner

cd8912e0

Fix api to create int4 ort value

4cf3a751

Wrap long lines in tensorprotoutils

ca785c2a

Add operator unit tests for Dequant int4/uint4

f87e785c

Remove comments

d028f2f7

Add QuantizeLinear int4 impl

24cc6172

Update operator docs

f35b09ef

Disable potentially bugged onnx tests

e33f198e

Add TODO username

10f28aa7

Fix warning as error and clean up

de1ded4c

Merge branch 'main' into adrianl/dq-transpose-int4

378718ea

Mlas kernels to quantize int4 (not blocked). Missing powerpc

746312b8

branchless update of 4-bit element

2935f797

more branchless update of int4 lane

807537cd

Fix cast warning as error

a36a128d

Remove decrement of N

bc445571

Clean up Int4x2 class

f40992dc

Github linter fixes

d0e17e21

Remove temporary unittest

6568d48a

Case statement missing :

b8d5869f

Merge branch 'adrianl/dq-transpose-int4' into adrianl/qnn-int4

80dadc5c

Ported over INT4 work for qnn ep

b01f52c4

Add debug API to qnn's get_qnn_qdq_config() for int4

35d3fa5e

lintrunner

a6a33976

Fix use of int4 qdq options

1a528f11

Merge main and fix conflicts

58044f93

Remove unnecessary changes

168f4e04

Workaround that dels calibrator.model

7ede88b9

Merge branch 'main' into adrianl/qnn-int4

ca6e5f92

Add Conv int4 per-channel test

4b2af75e

Merge main

578d47da

Clean up

9770a97b

Merge branch 'main' into adrianl/qnn-int4

eb9687a4

Run lintrunner

92efb6d5

Add more comments

05fee606

Merge branch 'main' into adrianl/qnn-int4

def223dd

Fix line lengths and TODO linter warnings

87d84399

adrianlizarraga added ep:QNN

adrianlizarraga marked this pull request as ready for review 1 year ago

adrianlizarraga requested a review from

HectorSVC 1 year ago

adrianlizarraga requested a review from

jywu-mysoft 1 year ago

Merge branch 'main' into adrianl/qnn-int4

618bdf96

Do not run QNN Int4 model test on windows x64

2d163a4e

HectorSVC approved these changes on 2024-07-09

adrianlizarraga merged 5753f8da into main 1 year ago

adrianlizarraga deleted the adrianl/qnn-int4 branch 1 year ago

Reviewers

HectorSVC

jywu-mysoft

Assignees

No one assigned

Labels

ep:QNN

Milestone

No milestone

onnxruntime [QNN EP] Initial INT4 support #21171 Merged

[QNN EP] Initial INT4 support #21171

onnxruntime
[QNN EP] Initial INT4 support
#21171

Merged