transformers
84556e05 - remove q_norm/k_norm sharding and gather after projections

Commit
8 days ago
remove q_norm/k_norm sharding and gather after projections Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Author
Parents
Loading