onnxruntime
Implement FlashAttention for CPU
#20805
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
36
Changes
View On
GitHub
Commits
Register new contrib op FlashAttention
duanqn
committed
1 year ago
Move getenv to constructor
duanqn
committed
1 year ago
Get Env
duanqn
committed
1 year ago
Renaming
duanqn
committed
1 year ago
Check for T==float
duanqn
committed
1 year ago
Lintrunner
duanqn
committed
1 year ago
Remove mlas function
duanqn
committed
1 year ago
Handle scale; Require present_k and present_v to be empty
duanqn
committed
1 year ago
Check is_unidirectional_
duanqn
committed
1 year ago
fix build
duanqn
committed
1 year ago
Merge with mlas.h
duanqn
committed
1 year ago
Add comment and MLASCALL
duanqn
committed
1 year ago
Remove unnecessary change
duanqn
committed
1 year ago
Fix onnxruntime_mlas.cmake
duanqn
committed
1 year ago
Pick onnxruntime/test/python/transformers/benchmark_mha.py from latest master
duanqn
committed
1 year ago
Disable FlashAttention by default
duanqn
committed
1 year ago
Fix value choice of row_size_q and row_size_kv; Add comments
duanqn
committed
1 year ago
Fix order
duanqn
committed
1 year ago
causal=False
duanqn
committed
1 year ago
Add MLASCALL on implementation
duanqn
committed
1 year ago
Improve comment
duanqn
committed
1 year ago
Enable FlashAttention by default
duanqn
committed
1 year ago
lintrunner -a
duanqn
committed
1 year ago
Remove memset
duanqn
committed
1 year ago
Fix l2_cache_size_
duanqn
committed
1 year ago
Fix PREfast
duanqn
committed
1 year ago
#include <algorithm>
duanqn
committed
1 year ago
Fix bug
duanqn
committed
1 year ago
lintrunner
duanqn
committed
1 year ago
Renaming
duanqn
committed
1 year ago
Renaming
duanqn
committed
1 year ago
Use MlasSgemmOperation
duanqn
committed
1 year ago
Move threading inside MLAS kernel
duanqn
committed
1 year ago
Remove MLASCALL
duanqn
committed
1 year ago
Remove 1 TODO
duanqn
committed
1 year ago
Renaming
duanqn
committed
1 year ago
Loading