Updating QDQ to support Float8E4M3FN #16550
Updating QDQ to support float8 types
724b4af6
lint
d99167ab
Merge branch 'main' of https://github.com/microsoft/onnxruntime into qdq
a566de15
improves quantization tests
409aea11
Merge branch 'main' of https://github.com/microsoft/onnxruntime into qdq
67b1fdfb
lint
603c0701
lint + bug
c9445602
Merge branch 'main' of https://github.com/microsoft/onnxruntime into qdq
eef744ed
fix quantization tests
05b6f313
Merge branch 'main' of https://github.com/microsoft/onnxruntime into qdq
b8621257
quantization
8531cf8d
better comments
46b39bbb
lint
ec21185f
Update quantization tool
5ac9e33c
Merge branch 'main' of https://github.com/microsoft/onnxruntime into qdq
ccec94d7
fix test for int, uint
8bbe3534
remove debug
47625fc9
lint
1a50b423
Fix quantization for bias
759cdc04
night commit
1b9fe1e7
still missing one step
54aace97
remove test on version
b0fd4123
lint
687a7afb
lint
a473cbf8
replace DequantizationLinear by Cast
6f6571dd
lint
fc0b6abf
fix QuantizeLinear for float16
cb394b15
fix qgemm
59cfb1b0
Merge branch 'qdq' of https://github.com/xadupre/onnxruntime into qdq
a09f085a
lint
54949ff2
lint
a97bdafd
fix unit test for quantization
f0625fc9
lint
216966fe
skip reference evaluator for one test
6982747f
fix misspelling used to debug
bd12e830
disable test when onnx not recent enough
e59bb7a4
fix remaining unit tests
f9dc10e5
xadupre
marked this pull request as ready for review 2 years ago
lint
4c25922b
lint
987bf680
yufenglee
approved these changes
on 2023-08-08
xadupre
merged
d0316ee7
into main 2 years ago
xadupre
deleted the qdq branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub