SkipSimplifiedLayerNorm + QuickGelu bfloat16 CUDA implementation #24772
skip bf16 impl
bc33c0ee
QuickGELU
2ea6b6bc
Remove packed bfloat16 op()
5fc32421
nenad1002
marked this pull request as draft 289 days ago
Allign method name
5f9ace63
Format more
0135b97a
Remove unused code + format
3eb0cbaf
Update docs
2192e448
Update operator kernel docs as well
9971838b
nenad1002
marked this pull request as ready for review 288 days ago
tianleiwu
dismissed these changes
on 2025-05-16
Use constrexpr
2c29a56c
nenad1002
dismissed their stale review
via 2c29a56c
285 days ago
nenad1002
merged
99836802
into main 284 days ago
nenad1002
deleted the nebanfic/skip-bf16 branch 284 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub