onnxruntime
[Quant tool] Improve performance of int4 weight quantization
#20935

Merged

[Quant tool] Improve performance of int4 weight quantization #20935

adrianlizarraga merged 7 commits into main from adrianl/quant-tool-int4-perf

Improve latency of code path that quantizes int4 weights

ab293850

Add unit test for pack_bytes_to_4bit()

7fae1fcf

No need to call flatten().tobytes()

362936b5

adrianlizarraga added ep:QNN

Add comment about onnx's int4 and uint4 np.dtypes

3559d711

adrianlizarraga marked this pull request as ready for review 2 years ago

adrianlizarraga requested a review from

yufenglee 2 years ago

adrianlizarraga requested a review from

jywu-msft 2 years ago

adrianlizarraga requested a review from

HectorSVC 2 years ago

adrianlizarraga added quantization

adrianlizarraga added python

More unit test and clean up

94b2afe0

Merge branch 'main' into adrianl/quant-tool-int4-perf

9585b0d1

Better test input data with larger range

385715ea

yufenglee requested a review from

fajin-corp 2 years ago

yufenglee approved these changes on 2024-06-05

fajin-corp commented on 2024-06-05

fajin-corp approved these changes on 2024-06-05

adrianlizarraga merged df28c7d7 into main 2 years ago

adrianlizarraga deleted the adrianl/quant-tool-int4-perf branch 2 years ago

Reviewers

fajin-corp

yufenglee

jywu-msft

HectorSVC

Assignees

No one assigned

Labels

quantization python ep:QNN

Milestone

No milestone