DeepSpeed
[ROCm] Relax tolerances for FP8 unit test for fp16 and bf16 cases
#7655
Merged

[ROCm] Relax tolerances for FP8 unit test for fp16 and bf16 cases #7655

rraminen
rraminen rraminen requested a review from tjruwase tjruwase 61 days ago
rraminen rraminen requested a review from loadams loadams 61 days ago
rraminen rraminen requested a review from tohtana tohtana 61 days ago
jithunnair-amd
jithunnair-amd commented on 2025-10-30
rraminen rraminen marked this pull request as draft 58 days ago
rraminen rraminen marked this pull request as ready for review 48 days ago
rraminen rraminen marked this pull request as draft 44 days ago
rraminen rraminen marked this pull request as ready for review 34 days ago
rraminen Relax tolerance
80e6f533
stas00 ALST/UlyssesSP: more intuitive API wrt variable seqlen (#7656)
160730e4
rraminen Fix misplaced overflow handling return in fused_optimizer.py (#7645)
b01045b7
therealnaveenkamal [bug]: fixed comm_dtype in extra_large_param_to_reduce (#7660)
a21a0747
stas00 UlyssesSP: TiledMLP doc - recomputes forward twice (#7664)
dd2e1474
therealnaveenkamal resolved a 0-dim tensor slicing bug from _get_state_without_padding (…
56ca87be
kunheek Fix typo in pytorch-profiler.md documentation (#7652)
b3fa61f2
sfc-gh-truwase README refresh (#7668)
014ee5fc
loadams Update version.txt after release (#7675)
d324f97d
stas00 [modal ci] fixes (#7676)
2d8d5238
stas00 leaf modules: explain better (#7674)
97301535
stas00 disable nv-lightning-v100.yml cI (#7681)
24990451
delock allow seperate learning rate "muon_lr" and "adam_lr" for muon optimiz…
08c6d1de
stas00 see_mem_usage: make always work (#7688)
a0fde72a
stas00 make debug utils more resilient (#7690)
0387a0a1
stas00 stage 1-2: don't pin memory if not configured (#7689)
c69ab198
stas00 modal ci: fix group concurrency (#7691)
7a94820a
Emrys-Merlin Use pytorch utils to detect ninja (#7687)
f00b3887
loadams Update SECURITY.md to point to GitHub reporting rather than Microsoft…
767fe524
delock Add Qwen2.5 to AutoTP model list (#7696)
e96064e3
tohtana Trust intel server for XPU tests (#7698)
2972ef84
tohtana PyTorch-compatible backward API (#7665)
a7ea3f6c
rraminen rraminen force pushed from 1089fa4f to a7ea3f6c 27 days ago
rraminen rraminen requested a review from jomayeri jomayeri 27 days ago
sfc-gh-truwase Merge branch 'master' into relax_tol_testFP8_ROCm
ae3ae053
sfc-gh-truwase
sfc-gh-truwase commented on 2025-12-02
sfc-gh-truwase Apply suggestion from @sfc-gh-truwase
72d4e997
sfc-gh-truwase
sfc-gh-truwase approved these changes on 2025-12-02
sfc-gh-truwase sfc-gh-truwase merged 28fbb808 into master 25 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone