Quantization tool: support float 8 with MatMul, support float 16 weights #18043
Update quantization tools to support MatMul with float 8
f09ad3cb
support float16
bd47221e
more consistent with types
23c9e393
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
7e7d0fe2
fix types
c073b88b
fix many unit tests
35db1499
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
799f7c54
fix conversion, rounding
3a14a313
new fixes
8199196a
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
87e2ba17
fix softmax qdq
6003524f
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
aa11d25b
fix shape info
0e966688
update test
170e5c9b
fix remaining unit tests
47b41b60
add value_info
b15ffcdd
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
01636994
add subtest
34b7a388
refactoring onnxruntime/test/python/quantization/test_op_matmul.py
8711792d
disable f16 for old onnx package
4806a041
disable f16 unit tests
260dd59e
support for Conv and float 16
d376f664
extend unit test for Conv
76d92843
fix lint
d2f9294b
xadupre
marked this pull request as ready for review 2 years ago
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
6702b81f
change the disable condition
a6433e88
lint
301c435f
final fix
1e70b3e0
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
9049bcd2
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
e4f0415b
fir merge conflicts
47efb3e0
fix merge conflict
fcf40b40
fix missing min_real_range
bf44eba0
merge conflicts
81528cc7
fix constant
63b84d19
fix missing dtype
a90f9f5f
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
0164a38d
use np arrays
e6c39f4c
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
1c8ae860
improve robustness
1185de09
fix type issue
aee75fff
fix wrong types
63a8ea90
fix one bug
1948c4ad
fix dtype issue
67bab544
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
fe1d0fe0
better error message
1d49bc51
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
fc406f9d
yufenglee
approved these changes
on 2024-01-11
xadupre
merged
c8399a81
into main 2 years ago
xadupre
deleted the qdqmm branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub