DeepSpeed
reduce all-to-all communication volume when both expert and non-expert are tensor-parallel
#5626
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
9
Changes
View On
GitHub
Loading