onnxruntime
fd3e0e84
- [x86] matmul8bit memory loading perf tuning (#24732)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
352 days ago
[x86] matmul8bit memory loading perf tuning (#24732) ### Description Use aligned load and preloading. There is ~10% token generation speed up. ### Motivation and Context Optimize perf
References
#24732 - [x86] matmul8bit memory loading perf tuning
Author
fajin-corp
Parents
00bd398d
Loading