DeepSpeed
Fix: Update grad norm calculation for CPU offload
#7302
Merged

Fix: Update grad norm calculation for CPU offload #7302

sfc-gh-truwase merged 8 commits into deepspeedai:master from naveen/fix-cpu-offload
therealnaveenkamal
therealnaveenkamal Fix: Update grad norm calculation for CPU offload
92d30de3
therealnaveenkamal therealnaveenkamal requested a review from tjruwase tjruwase 236 days ago
therealnaveenkamal therealnaveenkamal requested a review from tohtana tohtana 236 days ago
sfc-gh-truwase
sfc-gh-truwase commented on 2025-05-22
sfc-gh-truwase
therealnaveenkamal
sfc-gh-truwase
therealnaveenkamal Merge Upstream
17465324
therealnaveenkamal Modified set_local_grad_for_param to update norm_for_param_grads
db5abed7
therealnaveenkamal Unittest for CPU Offload Norm Grad Update
47b229f4
therealnaveenkamal therealnaveenkamal requested a review from loadams loadams 235 days ago
therealnaveenkamal therealnaveenkamal requested a review from sfc-gh-truwase sfc-gh-truwase 235 days ago
therealnaveenkamal
sfc-gh-truwase
sfc-gh-truwase commented on 2025-05-23
therealnaveenkamal test: removed print and cleanup
d8cf050e
therealnaveenkamal Merge branch 'master' of https://github.com/deepspeedai/DeepSpeed int…
2ac4eaf4
therealnaveenkamal Fixed mpi4py dependency and formatting
b6bea01a
therealnaveenkamal fix: port issue, handled exceptions and fp16 support
cb9e7237
sfc-gh-truwase
sfc-gh-truwase approved these changes on 2025-05-27
sfc-gh-truwase sfc-gh-truwase merged b9af5d8d into master 231 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone