onnxruntime
[Quant tool] Improve performance of int4 weight quantization
#20935
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
7
Changes
View On
GitHub
[Quant tool] Improve performance of int4 weight quantization
#20935
adrianlizarraga
merged 7 commits into
main
from
adrianl/quant-tool-int4-perf
Improve latency of code path that quantizes int4 weights
ab293850
Add unit test for pack_bytes_to_4bit()
7fae1fcf
No need to call flatten().tobytes()
362936b5
adrianlizarraga
added
ep:QNN
Add comment about onnx's int4 and uint4 np.dtypes
3559d711
adrianlizarraga
marked this pull request as ready for review
2 years ago
adrianlizarraga
requested a review
from
yufenglee
2 years ago
adrianlizarraga
requested a review
from
jywu-msft
2 years ago
adrianlizarraga
requested a review
from
HectorSVC
2 years ago
adrianlizarraga
added
quantization
adrianlizarraga
added
python
More unit test and clean up
94b2afe0
Merge branch 'main' into adrianl/quant-tool-int4-perf
9585b0d1
Better test input data with larger range
385715ea
yufenglee
requested a review
from
fajin-corp
2 years ago
yufenglee
approved these changes on 2024-06-05
fajin-corp
commented on 2024-06-05
fajin-corp
approved these changes on 2024-06-05
adrianlizarraga
merged
df28c7d7
into main
2 years ago
adrianlizarraga
deleted the adrianl/quant-tool-int4-perf branch
2 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
fajin-corp
yufenglee
jywu-msft
HectorSVC
Assignees
No one assigned
Labels
quantization
python
ep:QNN
Milestone
No milestone
Login to write a write a comment.
Login via GitHub