DeepSpeed
7f2f4232 - Put Muon optimizer momentum buffer on GPU (#7648)

Commit
29 days ago
Put Muon optimizer momentum buffer on GPU (#7648) This PR put Muon optimizer momentum buffer on GPU. This makes Muon optimizer executes much faster (finetune Qwen2.5-3B on 2xA100 cards, iteration time 1500ms --> 910ms). Previously this buffer is on CPU. --------- Signed-off-by: Guokai Ma <guokai.ma@intel.com>
Author
Parents
Loading