onnxruntime
SkipSimplifiedLayerNorm + QuickGelu bfloat16 CUDA implementation
#24772
Merged

SkipSimplifiedLayerNorm + QuickGelu bfloat16 CUDA implementation #24772

nenad1002 merged 9 commits into main from nebanfic/skip-bf16
nenad1002
nenad1002 skip bf16 impl
bc33c0ee
nenad1002 QuickGELU
2ea6b6bc
nenad1002 Remove packed bfloat16 op()
5fc32421
nenad1002 nenad1002 marked this pull request as draft 289 days ago
nenad1002 Allign method name
5f9ace63
nenad1002 Format more
0135b97a
tianleiwu
tianleiwu commented on 2025-05-15
tianleiwu
tianleiwu commented on 2025-05-15
tianleiwu
tianleiwu commented on 2025-05-15
nenad1002 Remove unused code + format
3eb0cbaf
nenad1002 Update docs
2192e448
nenad1002 Update operator kernel docs as well
9971838b
nenad1002 nenad1002 marked this pull request as ready for review 288 days ago
nenad1002 nenad1002 requested a review from tianleiwu tianleiwu 288 days ago
tianleiwu
tianleiwu dismissed these changes on 2025-05-16
yuslepukhin
yuslepukhin commented on 2025-05-19
nenad1002 Use constrexpr
2c29a56c
nenad1002 nenad1002 dismissed their stale review via 2c29a56c 285 days ago
yuslepukhin
yuslepukhin approved these changes on 2025-05-19
nenad1002 nenad1002 merged 99836802 into main 284 days ago
nenad1002 nenad1002 deleted the nebanfic/skip-bf16 branch 284 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone