DeepSpeed
b84287e8 - Add weight copy to base mlp ln, seperate norms from tensors in mlp/attn print statements

Commit
2 years ago
Add weight copy to base mlp ln, seperate norms from tensors in mlp/attn print statements
Author
Parents
Loading