onnxruntime
[Quant tool] Improve performance of int4 weight quantization
#20935
Merged

[Quant tool] Improve performance of int4 weight quantization #20935

adrianlizarraga
adrianlizarraga Improve latency of code path that quantizes int4 weights
ab293850
adrianlizarraga Add unit test for pack_bytes_to_4bit()
7fae1fcf
adrianlizarraga No need to call flatten().tobytes()
362936b5
adrianlizarraga adrianlizarraga added ep:QNN
adrianlizarraga Add comment about onnx's int4 and uint4 np.dtypes
3559d711
adrianlizarraga adrianlizarraga marked this pull request as ready for review 2 years ago
adrianlizarraga adrianlizarraga requested a review from yufenglee yufenglee 2 years ago
adrianlizarraga adrianlizarraga requested a review from jywu-msft jywu-msft 2 years ago
adrianlizarraga adrianlizarraga requested a review from HectorSVC HectorSVC 2 years ago
adrianlizarraga adrianlizarraga added quantization
adrianlizarraga adrianlizarraga added python
adrianlizarraga More unit test and clean up
94b2afe0
adrianlizarraga Merge branch 'main' into adrianl/quant-tool-int4-perf
9585b0d1
adrianlizarraga Better test input data with larger range
385715ea
yufenglee yufenglee requested a review from fajin-corp fajin-corp 2 years ago
yufenglee
yufenglee approved these changes on 2024-06-05
fajin-corp
fajin-corp commented on 2024-06-05
fajin-corp
fajin-corp approved these changes on 2024-06-05
adrianlizarraga adrianlizarraga merged df28c7d7 into main 2 years ago
adrianlizarraga adrianlizarraga deleted the adrianl/quant-tool-int4-perf branch 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone