onnxruntime
fd3e0e84 - [x86] matmul8bit memory loading perf tuning (#24732)

Commit

352 days ago

[x86] matmul8bit memory loading perf tuning (#24732) ### Description Use aligned load and preloading. There is ~10% token generation speed up. ### Motivation and Context Optimize perf

References

#24732 - [x86] matmul8bit memory loading perf tuning

Author

fajin-corp

Parents

00bd398d

onnxruntime fd3e0e84 - [x86] matmul8bit memory loading perf tuning (#24732)

onnxruntime
fd3e0e84 - [x86] matmul8bit memory loading perf tuning (#24732)